HDI-Project / ATM

Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).
https://hdi-project.github.io/ATM/
MIT License
525 stars 141 forks source link

Ensure datasets can be downloaded from S3 #137

Closed csala closed 5 years ago

csala commented 5 years ago

In the past ATM was able download datasets stored in S3, but at some point the functionality seems to have been broken and it's not working any more.

We should review it and make the necessary changes to have it working again. We should also replace boto dependency with boto3, which is able to use of aws credentials from the user folder without him having to explicitly specify them.

We should, both for S3 and HTTP, we should download the data to a place relative to the current directory instead of inside the ATM code folders.

Finally, we should reorganize the code in a way that is easily extensible in the future if we want to add support for other data sources (FTP, SFTP, HFS...)