bianjiang / tweetf0rm

A twitter crawler in Python
MIT License
303 stars 107 forks source link

Collections in querystring with OR return duplicate tweet post #25

Open estathop opened 4 years ago

estathop commented 4 years ago

Good afternoon, I noticed that when you form a querystring with multiple keywords divided by OR, the crawler fetches the same tweet more than 1 time. For instance, if 2 distinct keywords in the query string are present in the same tweet, the tweet will be crawled twice, I verified with by monitoring the Tweet IDs inside a DB. Is there an easy way to eliminate this phenomenon in the crawler or should I apply other tactics, e.g. file storage, python dictionary in RAM, query of existence upon DB to discard, etc. Thanks for you time