benchflow / data-transformers

Spark scripts utilised to transform data to the BenchFlow internal formats
Other
0 stars 0 forks source link

Optimise pulling data from Minio #3

Closed Cerfoglg closed 8 years ago

Cerfoglg commented 8 years ago

Find a more efficient method to pull data from Minio. The current way data is pulled from Minio uses the urllib python library (https://docs.python.org/3/library/urllib.html), which essentially downloads the entire data csv file by sending a request via a URL, which isn't the most efficient way of doing it.

Alternatives to look into:

Update: we decided to use the Minio API https://github.com/benchflow/data-transformers/issues/56