JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.33k stars 699 forks source link

Timing inconsistent when used in function in jupyter notebooks #108

Closed essentialols closed 3 years ago

essentialols commented 3 years ago

I'd like to use snscrape twitter-search in a jupyter notebook. This works well for the most part. But there is a bug and I don't know if it is because of snscrape or something else.

Snscrape works fine when I run it this way:

results = !snscrape --jsonl twitter-search "trump since:2020-08-30 until:2020-09-01" print(results)

I know it works as expected because the first tweet I receive is from "2020-08-31T23:59:59+00:00" just as intended.

Yet when I put the call to snscrape inside a function like so

def scrape(word, since, until): results = !snscrape --jsonl twitter-search f"{word} since:{since} until:{until}" print(results) scrape("trump", "2020-08-30", "2020-09-01")

the first tweet I receive is from 2020-08-31T23:54:39+00:00.

I don't know what's going on here... Any ideas?

Edit: I noticed that the issue is much bigger than expected. Using the scraper in the function yields 91 tweets. Otherwise I get over 76903 tweets.

JustAnotherArchivist commented 3 years ago

This sounds like an issue with the invocation of snscrape. I don't see any problems at a glance, but I also have only very superficial experience with Jupyter Notebook. A first step would be to figure out the exact command that is executed. A debug log from snscrape (see README) might also be helpful (or would at least reveal where snscrape is lacking log messages).

JustAnotherArchivist commented 3 years ago

Closing this because I can't reproduce it and it seems like an environment issue. Feel free to reopen with the information mentioned above if needed.