The current dataset_download should be refactored since there are too much things on going. It should be split in two modules:
A module with a DatasetGenerator object with a main generate method that runs the retrieval, formatting, post_process, import of pandas (class method) and export into csv. There are way too much passing arguments between functions calls right now and it manages too much things. We must store the data in the object and its interface should have simple methods*. Plus it would be good to create a subpackage for data in general. It would group dataset generator, tracks config, dataset loader and pipeline work. (segmentation, sampling). The choice of the shared state of the object should be wisely thought, before any change, discuss here on the solution.
A script module at the root level of the hp_pred package. An easy interface to generate the used data.
*: More refactor might be done to separate roles but it is useless since we do not attend to develop a framework for dataset...
The current
dataset_download
should be refactored since there are too much things on going. It should be split in two modules:DatasetGenerator
object with a maingenerate
method that runs the retrieval, formatting, post_process, import of pandas (class method) and export into csv. There are way too much passing arguments between functions calls right now and it manages too much things. We must store the data in the object and its interface should have simple methods*. Plus it would be good to create a subpackage for data in general. It would group dataset generator, tracks config, dataset loader and pipeline work. (segmentation, sampling). The choice of the shared state of the object should be wisely thought, before any change, discuss here on the solution.hp_pred
package. An easy interface to generate the used data.*: More refactor might be done to separate roles but it is useless since we do not attend to develop a framework for dataset...