DataHaskell / dh-core

Functional data science
138 stars 23 forks source link

datasets : verify downloads with hashes, multithread large downloads #29

Open stites opened 5 years ago

stites commented 5 years ago

The datasets downloader could use the above improvements: verifying downloads with hashes, and multithreading large downloads. I've written a version of the first feature in the Setup.ht for a personal moby/dictd replacement, so it might look something like the below:

https://gist.github.com/stites/82acb2036d1654b0ef0c34ec4443579b

austinvhuang commented 5 years ago

The tensorflow mnist downloader also does this and provides an example (recalling from an ancient dangling PR i submitted lol)

https://github.com/tensorflow/haskell/blob/8e1d85b5e5bd56d54ff6d463c8581c57ab5526d9/tensorflow-mnist-input-data/Setup.hs

stites commented 5 years ago

Definitely more official (and better probably better community-mojo) if we use your tf work, @austinvhuang. Removing my Setup.hs -- it's pretty much the same thing.