Closed evaristoc closed 7 years ago
Several people have been asked to try out the link to the file in BitTorrent. Contributors so far who also tested the dataset link:
[Contributors: Please add comments if you think necessary...]
From the tests by contributors on Gitter Data Science Room and Gitter Contributors Room I could point out:
Some references:
I tested this by downloading the file in the AM EST on 11/4. At first seeders were an issue after waiting i'd say around 15 minutes one seeder popped up and then about a third of the way through another seeder with much higher bandwidth allowing me to complete the download in about 30-40 minutes with a total time of 1h 43 m for the entire file to download
I'm in London, Ontario, which is EST, UTC -5 I began downloading at 9:31am today. Speeds fluctuated at first, with a single seeder, although that seeder sent almost immediately. Once the second seeder joined, the speeds went between 500KiB/s and my max, 1.5MiB/s. The file completed at 09:39. The trouble with torrenting the data is that it requires a torrent client be running in order to seed, and also that the user's ports be forwarded accurately to the pc running the client in order to accept incoming requests. This also requires a static LAN IP, or else the port forwarding will only be correct on occasion.
My download couldn't start when first initialized, due to the seeder and leechers all being offline. This was when there was only one seeder. After @mcbarlowe and @jp-sauve completed theirs, I came back to this on Monday (yesterday) and completed mine. From start to finish, it took 3 hours to download because I used a VPN, which slowed it down. I am in Seattle, but my connection was routed through Canada while my TCP and UDP IP addresses were both German. My download came from @jp-sauve and @evaristoc, according to the IP addresses.
Torrenting has the advantage of being a bit more accessible to those whose access is restricted where they are located. It's really great once the file is well-established and widely distributed, which also gives it a fairly high fault tolerance.
On the other hand, accessibility is hindered when the files are not widely distributed (ie. when the peer & seeder count is low). There's a ramping phase, of which we have already had a taste. There is also a bit of learning involved with BitTorrent. The downloader must know how to download. Everybody knows how to use a browser. Not every person who is interested in data analysis knows how to use BitTorrent software, but I'm sure they are smart enough to figure it out. Still, it's enough to put some people off. There's also some stigma in the USA, especially in the eyes of many ISPs.
My experience downloading it via BitTorrent went as expected. In my opinion, it should be a backup or alternative download source, just like the Linux distros.
Some points:
My personal views:
However my views do not necessarily exclude the use of BitTorrent as option.