Open tmbdev opened 3 years ago
Cool!
Both @jpsamaroo and @vchuravy work on multinode/multi GPU computing and you might be interested in working with or reaching out to them. Also check out dagger.jl , https://github.com/JuliaComputing/DataSets.jl, filetrees.jl and the juliafolds ecosystem
We might want to add this to the ecosystem page when the package is ready?
I'm starting to use Flux.jl more heavily, so I'll be adding more examples over the next few weeks.
I'm the developer of WebDataset for PyTorch, a linearly scalable format, libraries, and server for PyTorch. WebDataset represents datasets as .tar archives of files on disk and allows access to them from any web server, object store, and cloud storage system. It's all open source, and we have demonstrated 1 Gbyte/s per GPU I/O speeds.
The PyTorch implementation is at github.com/tmbdev/webdataset; the server implementation is at github.com/nvidia/aistore.
I have recently implemented a multithreaded loader for Julia that can read the same format. You can find it at github.com/tmbdev/WebDataset.jl.
You might want to add this to the resources, as well as take it into account for DataLoaders.jl and FastAI.jl
(I work on very large scale machine learning problems, so my next step is to see how I can get multi-GPU and multinode training to work in Julia.)