Good afternoon,
I noticed that when you form a querystring with multiple keywords divided by OR, the crawler fetches the same tweet more than 1 time.
For instance, if 2 distinct keywords in the query string are present in the same tweet, the tweet will be crawled twice, I verified with by monitoring the Tweet IDs inside a DB.
Is there an easy way to eliminate this phenomenon in the crawler or should I apply other tactics, e.g. file storage, python dictionary in RAM, query of existence upon DB to discard, etc.
Thanks for you time
Good afternoon, I noticed that when you form a querystring with multiple keywords divided by OR, the crawler fetches the same tweet more than 1 time. For instance, if 2 distinct keywords in the query string are present in the same tweet, the tweet will be crawled twice, I verified with by monitoring the Tweet IDs inside a DB. Is there an easy way to eliminate this phenomenon in the crawler or should I apply other tactics, e.g. file storage, python dictionary in RAM, query of existence upon DB to discard, etc. Thanks for you time