Neuraxio / Neuraxle

The world's cleanest AutoML library ✨ - Do hyperparameter tuning with the right pipeline abstractions to write clean deep learning production pipelines. Let your pipeline steps have hyperparameter spaces. Design steps in your pipeline like components. Compatible with Scikit-Learn, TensorFlow, and most other libraries, frameworks and MLOps environments.
https://www.neuraxle.org/
Apache License 2.0
608 stars 62 forks source link

Basic multiprocess in AutoML loop, Logger and many more quality of life changes #448

Closed vincent-antaki closed 3 years ago

vincent-antaki commented 3 years ago

Another PR with a many quality of life improvements!

Multiprocess

This PR introduces a very simple multiprocess interface for AutoML loop. The AutoML loops now takes a "multiprocess" boolean argument and a "n_processes" integer argument in input. If multiprocess is False then the AutoML loop is executed in a single process, if it is True, then a multiprocess Pool is instanciated with n_processes process and provides a trial for each process to perform until all trials are complete.

This doesn't fulfill #297, because trial splits aren't parallelized. But I think it would be a simple modifications to also get this behaviour down to the trial split level.

Logger

Instead of prints, we now have use python's logging library! In an AutoML loop, we use a different file-based logger for every trial which are saved in the HyperparametersRepository cache folder. This closes issue #391. (And closes #255, closes #257, closes #283 and closes #298 which are the basically the same issues)

Continue-break-statement-for-each-data-input

I've retrieved Alex's last commit of his Continue-break-statement-for-each-data-input branch (PR #319) and applied the review comments. Also introduce ExecuteIf class which allows us to easily introduce logical if in our pipeline design. These add-ons partially full fill issue #443 by adding BreakIf, ContinueIf and ExecuteIf.

Introduction of a DeprecatedMetaClass.

Created a utils.py file and added a DeprecatedMetaClass which allows us to rename classes without breaking compatibility (shamelessly taken from stackoverflow).

Refractor ForEachDataInputs and NumpyConcatenateOnCustomAxis to ForEach and NumpyConcatenateOnAxis. This fixes #284.

N.B. This features has been rolled back because it created too much problem.

ZipMinibatchJoiner and ZipFeatures

Introduced two classes making use of the ZipDataContainer data structure to join DataContainer instances at different point in the pipeline. ZipMinibatchJoiner, as its name suggest, zips every n-th element of returned minibatches DataContainer together. ZipFeatures is a steps which expects its input to be a DataContainer containing a list of DataContainer and not expected output. ZipFeatures is designed to be used as joinder for FeatureUnion instance where concatenation is not possible or desireable (but could be used as a regular step also).

Other changes

guillaume-chevalier commented 3 years ago

@vincent-antaki Thanks, I will review this when the other branch is merged. To speed up the process, you may open a PR in your own fork from this branch to your master and then DM me to review it.

vincent-antaki commented 3 years ago

@guillaume-chevalier This is ready to merge. I've applied and/or answered all your comments.

pull-checklist[bot] commented 3 years ago

New contributor? Ensure you do this