JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.5k stars 712 forks source link

Not filtering retweets #163

Closed ajc356 closed 3 years ago

ajc356 commented 3 years ago

I have just run my snscrape query in two different iterations, one filtering retweets and one not. Yet I get the same number of results with each.

snscrape --jsonl --max-results 2000000 --since 2020-03-01 twitter-search '((CRB or #CRB) -clubes -coupe -SuperCoupe -SUPER_COUPE -football -belouizdad -chabab -USMA -brasileño -Brasileirão -Algérie -Railway -@IR_CRB -@RailMinIndia -@PiyushGoyal) OR CERB OR #CERB OR (#EI (Canada OR #onpoli OR #cdnpoli)) OR (Canada unemployment) OR ((#CRA or cra) (#covid19canada OR #covid19relief OR covid OR covid19 OR coronavirus or pandemic)) -filter:retweets' > cerb_no_retweets.json returns 255327 tweets

snscrape --jsonl --max-results 2000000 --since 2020-03-01 twitter-search '((CRB or #CRB) -clubes -coupe -SuperCoupe -SUPER_COUPE -football -belouizdad -chabab -USMA -brasileño -Brasileirão -Algérie -Railway -@IR_CRB -@RailMinIndia -@PiyushGoyal) OR CERB OR #CERB OR (#EI (Canada OR #onpoli OR #cdnpoli)) OR (Canada unemployment) OR ((#CRA or cra) (#covid19canada OR #covid19relief OR covid OR covid19 OR coronavirus or pandemic))' > cerb_with_retweets.json returns 255894 tweets

JustAnotherArchivist commented 3 years ago

As explained in your previous issue (#162), Twitter never returns retweets unless you explicitly specify include:nativeretweets or filter:nativeretweets. filter:retweets is for old-style retweets (tweet text prefixed by RT), which are essentially irrelevant nowadays.