Closed TAnas0 closed 2 years ago
Played around with it a bit as I'm really excited to start getting useful data! This is likely the top priority to get it in a working state:
Another one I'd like to see is a sleep time tag
The error I get frequently is "connection refused". I always bypass it with:
while res == '':
try:
res = scraper.get(nft_url)
break
except:
print('connection error')
time.sleep(30)
continue
That's how I've always done it. I have no idea if requests.packages.urllib3.util.retry.Retry offers any meaningful advantage over this.
Hey @nickbax ,
Data is now saved into CSVs, which I just put into the fair_drop
folder. Data about the NFTs is the OpenSea link, the name of the owner and if the NFT is marked suspicious or not.
Data is also saved by bulk of 25 NFTs, so it's gradually built and even if the script unexpectedly stops, it doesn't lose all data. The scraped data into CSVs is also being used as cache and links that have already been scraped are excluded, so resuming scraping is no problem at all.
I've implemented some parameters in this regard:
First was using request
's Retry
. Thanks a lot for pointing this one out. It makes the scraper more robust and resilient. I've set it to retry a maximum of 3 times each failed request (including rate limiting responses), with a backoff parameters of 8 . Which practically means it will respectively wait 4/8/16 seconds for each of the 3 retries.
If even after all these retries, the scraper is still rate limited, there is a configurable option for a sleep timer, which defaults to 30 seconds. And it retries 3 times as well, before giving up on the scraping job. Both of these parameters are adjustable as such:
python suspicious.py -c <collection_address> -r 5 -s 50
python suspicious.py -c <collection_address> --retry 5 --sleep 50
I suggest you try it on the following collections, because there are still some limitations that we need to address:
python fair_drop/suspicious.py -c 0xe21ebcd28d37a67757b9bc7b290f4c4928a430b1 # The Saudis
python fair_drop/suspicious.py -c 0x78d61c684a992b0289bbfe58aaa2659f667907f8 # Superplastic: supergucci
python fair_drop/suspicious.py -c 0xb47e3cd837ddf8e4c57f05d70ab865de6e193bbb # CryptoPunks
I (manually) tested the cache/retry and it seems to work as expected. :heavy_check_mark:
We have a working example of using multithreading with Retry session in pulling.py that works fairly well for us.
Resolves #86
This a scraper to check all NFTs in an OpenSea collection and check if they were marked as suspicious or not.
You can find a list of todos/improvements at the top of the main file of the PR. They will be developed after discussion and depending on the needs.
To test the file, you can run the following command:
python fair_drop/suspicious.py -c 0xbc4ca0eda7647a8ab7c2061c2e118a18a936f13d
TODOs:
@Barabazs Please let me know if you see possible improvements