apache / singa

a distributed deep learning platform
Apache License 2.0
3.33k stars 1.23k forks source link

Add a dataset module #701

Open nudles opened 4 years ago

nudles commented 4 years ago

Data loading is an important part of DL training, which could be slow and become a bottleneck if not implemented well. The tasks include

  1. implement dataset classes for common benchmark datasets to make them easy to access within SINGA (e.g., without manual downloading).
  2. implement common preprocessing operations
  3. implement parallel data loading for higher efficiency
nudles commented 4 years ago

Code from the data module may be reused. https://github.com/apache/singa/blob/master/python/singa/data.py

nudles commented 4 years ago

And https://github.com/apache/singa/blob/master/python/singa/image_tool.py