DocNow / twarc

A command line tool (and Python library) for archiving Twitter JSON
https://twarc-project.readthedocs.io
MIT License
1.37k stars 255 forks source link

Document the inexactness of --limit when searching #647

Open dolsysmith opened 2 years ago

dolsysmith commented 2 years ago

Running the following command yields 599 Tweets, not 500:

twarc2 search --limit 500 "blacklivesmatter" results.jsonl

Is that expected behavior? (Just wanting to confirm.)

edsu commented 2 years ago

That is the expected behavior yes. It could be documented better though :-)

igorbrigadir commented 2 years ago

Yeah it's because the limit is counted after retrieval, so it pages through results, which by default are 100 max, but can get fewer from the API, so you can get: 99, 100, 100, 100, 100, and then because you're still on 499, which is less than 500, it will have to make 1 extra call and get another 100.