JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.33k stars 699 forks source link

Thread safety #623

Open JustAnotherArchivist opened 1 year ago

JustAnotherArchivist commented 1 year ago

Since people seem to keep trying to use snscrape with threads (despite this not being listed as a feature anywhere) and running into problems (seemingly without searching the issues)...

snscrape is currently not thread-safe.

I'd like to evaluate at some point whether it's easy enough to make snscrape thread-safe. One known issue is the Twitter module's guest token manager. Testing thread safety will be an issue, too.

Relevant prior issues: #307 #584 #622

(SEO keywords: threading multithreading)

IvanTrendafilov commented 1 year ago

@JustAnotherArchivist you are saying snscrape is not thread-safe, but is it process safe? If I were to run multiple instances of the snscrape executable concurrently, would that cause issues?

JustAnotherArchivist commented 1 year ago

@IvanTrendafilov Yes, it is safe to run multiple instances of the CLI at the same time. Or indeed to use the snscrape package/modules from multiple independent Python processes in parallel (which is what the CLI does, anyway). The CLI also has code for token sharing between parallel Twitter scrapes.

IvanTrendafilov commented 1 year ago

great news, thank you.

obada-jaras commented 1 year ago

@JustAnotherArchivist Do you have any brief idea why this error is occurring, and do you have any suggestions for how to work around it while still using the library to scrape faster? Additionally, I'm curious if you have any resources or suggestions for learning how to use the library for fast scraping, as I'm relatively new to this.

I wanted to mention that I faced this problem when using multi-threading, but interestingly enough, when I ran the code in the multi cmds, it worked fine.