binance / binance-public-data

Details on how to get Binance public data
1.56k stars 479 forks source link

Torrent Download #96

Open dougransom opened 2 years ago

dougransom commented 2 years ago

Request a torrent download to updated each month for each combination of (spot,futures) (agg,kline,trade) that would have the data for all symbols, for (all time, the previous month). Then users could quickly have a copy of the data for backtesting.

dougransom commented 2 years ago

be even better if i could download a pandas dataframe for all symbols, as one file, via torrent or http. a seperate for 15m, 30m etc. data.

2pd commented 2 years ago

thank you for the suggestion, we will review it.

dougransom commented 2 years ago

As i think about this, this could work really well with your existing scripts. Publish the torrent every quarter for each interval (i.e. 1m, 5m, etc). To get the most current data, we would just have to run the existing scripts and it would bring our local copy up to the current one.

2pd commented 2 years ago

Since redirect downloading is good enough for now, we will not consider the torrent solution recently.

Thanks for your suggestion anyway.

dougransom commented 2 years ago

The experience has been painful for me, i am been trying for days to download 15m kline data for all symbols and I have a fast pipe. The smallest interruption for whatever mystery happens across the internet and I have to run the scripts again, and they start from the beginning of course indicating many file not found or file already exist errorrs..

Also, amazon S3 has some torrent features built in, so you just have to set them up and document them.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/uploading-downloading-objects.html

Then we just run the scripts as you suggest to update our local copy.

So if we could prime our local copy with torrent (it would at least make sure the whole tree downloads) and then fill in the gaps

jeffneuen commented 2 years ago

@dougransom One solution might be to turn the checksum download option on, add some code to the scripts to check for an existing file, if it exists, verify the checksum, and if it matches, skip downloading that file. There will be some delay to this verification, but in most cases the verification would be faster than the download (especially if you're able to store the data on an SSD).

LispDevel commented 2 years ago

Hi guys. How to download all files at once? For example, all csv files of the klines section?Thanks.

2pd commented 2 years ago

Hi guys. How to download all files at once? For example, all csv files of the klines section?Thanks.

It's recommended to download one file each time, it should be easier.

L-scientist commented 1 year ago

When I use download-kline.py to download multiple kline files at the same time, I often encounter this error: urllib.error.URLError: <urlopen error [WinError 10054], is the reason for this error because the download frequency is too high? Do I need to add sleep to the code? Or what parameters to add to the command line instructions?