BoKKeR / RSS-to-Telegram-Bot

RSS to Telegram python script
http://t.me/rss_t_bot
GNU General Public License v3.0
247 stars 105 forks source link

Messages duplicates with redis queue #45

Closed OleksanderSalamatov closed 11 months ago

OleksanderSalamatov commented 1 year ago

I am running develop branch in unraid container Sometimes messages from sources gets duplicated Looking at logs, i think that message is sent, even if it is error in logs like this

[Nest] 194  - 03/28/2023, 5:18:54 PM   DEBUG [RssService] last: https://3dnews.ru/1084145
[Nest] 194  - 03/28/2023, 5:18:54 PM   DEBUG [RssService] -------------done------------------
Error: Unable to parse XML.
    at /app/node_modules/rss-parser/lib/parser.js:36:25
    at Parser.<anonymous> (/app/node_modules/xml2js/lib/parser.js:304:18)
    at Parser.emit (node:events:394:28)
    at Parser.exports.Parser.Parser.parseString (/app/node_modules/xml2js/lib/parser.js:314:16)
    at Parser.parseString (/app/node_modules/xml2js/lib/parser.js:5:59)
    at /app/node_modules/rss-parser/lib/parser.js:33:22
    at new Promise (<anonymous>)
    at Parser.parseString (/app/node_modules/rss-parser/lib/parser.js:32:16)
    at RssService.handleInterval (/app/dist/src/rss/rss.service.js:115:41)
    at runMicrotasks (<anonymous>)
.....
-------checking feed: 3dnews----------
[Nest] 194  - 03/28/2023, 5:19:04 PM   DEBUG [RssService] last: https://3dnews.ru/1084145
[Nest] 194  - 03/28/2023, 5:19:04 PM   DEBUG [RssService] new items: 1
[Nest] 194  - 03/28/2023, 5:19:04 PM   DEBUG [RssService] Adding job: https://3dnews.ru/1084145 chat: -1001883154959
[Nest] 194  - 03/28/2023, 5:19:04 PM   DEBUG [RssService] saving: https://3dnews.ru/1084145
[Nest] 194  - 03/28/2023, 5:19:04 PM   DEBUG @Process id:10 attempts:0 message:https://3dnews.ru/1084145
[Nest] 194  - 03/28/2023, 5:19:04 PM   DEBUG [RssService] Done! saving checkpoint: https://3dnews.ru/1084145
[Nest] 194  - 03/28/2023, 5:19:04 PM   DEBUG [RssService] -------------done------------------
[Nest] 194  - 03/28/2023, 5:19:11 PM   DEBUG [RssService] 

Screenshots: image image

BoKKeR commented 1 year ago

I will try to look at this when I find time. Do you have any idea why it would fail to parse ? Can you share the link

OleksanderSalamatov commented 1 year ago

For now i don't find any consistent pattern of errors, it just happens occasionally with any of the rss sources. Here is sources list that i use:

Title: videocardz
RSS URL: https://feeds.feedburner.com/VideoCardzcom
Last checked entry: https://videocardz.com/newz/intel-releases-first-arc-pro-gpu-driver-in-4-months
Disabled: false

Title: 3dnews
RSS URL: https://3dnews.ru/news/rss
Last checked entry: https://3dnews.ru/1084142
Disabled: false

Title: macrumors
RSS URL: https://feeds.macrumors.com/MacRumors-All
Last checked entry: https://www.macrumors.com/2023/03/28/deals-get-25-off-magsafe/
Disabled: false

Title: overclockers.ua
RSS URL: https://www.overclockers.ua/rss.xml
Last checked entry: https://www.overclockers.ua/news/hardware/2023-03-28/132414/
Disabled: false

Title: ixbt_articles
RSS URL: http://www.ixbt.com/export/articles.rss
Last checked entry: https://www.ixbt.com/mobile/umidigi-g1-max-review.html
Disabled: false

Title: ixbt_news
RSS URL: http://www.ixbt.com/export/news.rss
Last checked entry: https://www.ixbt.com/news/2023/03/28/volkswagen-tiguan-2023.html
Disabled: false
BoKKeR commented 11 months ago

Thanks for the report, I finally found some time to look into it. Its fixed in develop branch, will try to push it to master. Its caused because of the following pattern:

  1. There is a scan of all feeds happening at given delay, lets say its 60 seconds. Lets call it SCAN A
  2. During SCAN A the last item of each feed is loaded into the memory.
  3. Sometimes SCAN A takes longer than the interval (60 seconds) the system kicks off another check automatically (they are not aware of each other)
  4. SCAN B starts, loads the same last feed items that SCAN A had into the memory.
  5. SCAN A finishes scanning, saves the last item of the feed to the database.
  6. SCAN B finishes scanning, but since it holds the old last item from memory. It thinks thinks that the feed has new items.

Now with enough amount of feeds this can create 2-3-4-5 messages.

I will have to take a look how this can be improved, so there are no multiple checks started