The world's cleanest AutoML library ✨ - Do hyperparameter tuning with the right pipeline abstractions to write clean deep learning production pipelines. Let your pipeline steps have hyperparameter spaces. Design steps in your pipeline like components. Compatible with Scikit-Learn, TensorFlow, and most other libraries, frameworks and MLOps environments.
Fix lots of issues that can be fixed by refactoring the AutoML modules as well as the Hyperparam Repos and some changes to base.py and other files using base.py. Also getting rid of ID re-hashing mechanism, summary IDs, checkpointers (except value checkpointers), and so forth.
How it works
AutoML contains a ControllerLoop. The ControllerLoop calls the Trainer with the splits. The repos are changed so that the AutoML loop be more abstract and considers different runs from the repo.
[x] Your local Git username is set to your GitHub username, and your local Git email is set to your GitHub email. This is important to avoid breaking the cla-bot and for your contributions to be linked to your profile. More info: https://github.com/settings/emails
[x] Argument's dimensions and types are specified for new steps (important), with examples in docstrings when needed.
[x] Class names and argument / API variables are very clear: there is no possible ambiguity. They also respect the existing code style (avoid duplicating words for the same concept) and are intuitive.
[x] Classes are documented: their behavior is explained beyond just the title of the class. You may even use the description written in your pull request above to fill some docstrings accurately.
[x] If a numpy array is used, it is important to remember that these arrays are a special type that must be documented accordingly, and that numpy array should not be abused. This is because Neuraxle is a library that is not only limited to transforming numpy arrays. To this effect, numpy steps should probably be located in the existing numpy python files as much as possible, and not be all over the place. The same applies to Pandas DataFrames.
[x] Code coverage is above 90% for the added code for the unit tests.
[x] The above description of the pull request in natural language was used to document the new code inside the code's docstrings so as to have complete documentation, with examples.
[x] Respect the Unit Testing status check
[x] Respect the Codacy status check
[x] Respect the cla-bot status check (unless the cla-bot is truly broken - please try to debug it first)
[x] Code files that were edited were reformatted automatically using PyCharm's Ctrl+Alt+L shortcut. You may have reorganized imports as well.
[x] Your local Git username is set to your GitHub username, and your local Git email is set to your GitHub email. This is important to avoid breaking the cla-bot and for your contributions to be linked to your profile. If at least 1 contribution is not commited properly using the good credentials, the cla-bot will break until your re-commit it.
[x] Use the PyCharm IDE with PyTest to test your code. Reformatting your code at every file save is a good idea, using PyCharm's Ctrl+Alt+L shortcut. You may reorganize imports automatically as well, as long as your project root is well configured. Run the tests to see if everything works, and always ensure that all tests run before opening a pull request as well.
[x] We recommend letting PyCharm manage the virtual environment by creating a new one just for this project, and using PyTest as a test runner in PyCharm. This is not required, but should help in getting you started.
[x] Please make your pull request(s) editable, such as for us to add you to the list of contributors if you didn't add the entry, for example.
[x] To contribute, first fork the project, then do your changes, and then open a pull request in the main repository.
[x] Sign the Contributor License Agreement (CLA) to allow Neuraxio to use and publish your contributions under the Apache 2.0 license, in order for everyone to be able to use your open-source contributions. Follow the instructions of the cla-bot upon opening the pull request.
Things to check at each Pull Request (PR)
[x] Your local Git username is set to your GitHub username, and your local Git email is set to your GitHub email. This is important to avoid breaking the cla-bot and for your contributions to be linked to your profile. More info: https://github.com/settings/emails
[x] Argument's dimensions and types are specified for new steps (important), with examples in docstrings when needed.
[x] Class names and argument / API variables are very clear: there is no possible ambiguity. They also respect the existing code style (avoid duplicating words for the same concept) and are intuitive.
[x] Classes are documented: their behavior is explained beyond just the title of the class. You may even use the description written in your pull request above to fill some docstrings accurately.
[x] If a numpy array is used, it is important to remember that these arrays are a special type that must be documented accordingly, and that numpy array should not be abused. This is because Neuraxle is a library that is not only limited to transforming numpy arrays. To this effect, numpy steps should probably be located in the existing numpy python files as much as possible, and not be all over the place. The same applies to Pandas DataFrames.
[x] Code coverage is above 90% for the added code for the unit tests.
[x] Respect the Unit Testing status check
[x] Respect the Codacy status check
[x] Respect the cla-bot status check (unless the cla-bot is truly broken - please try to debug it first)
[x] The above description of the pull request in natural language was used to document the new code inside the code's docstrings so as to have complete documentation, with examples.
[x] Code files that were edited were reformatted automatically using PyCharm's Ctrl+Alt+L shortcut.
What it is
Fix lots of issues that can be fixed by refactoring the AutoML modules as well as the Hyperparam Repos and some changes to base.py and other files using base.py. Also getting rid of ID re-hashing mechanism, summary IDs, checkpointers (except value checkpointers), and so forth.
How it works
AutoML contains a ControllerLoop. The ControllerLoop calls the Trainer with the splits. The repos are changed so that the AutoML loop be more abstract and considers different runs from the repo.
Checklist before merging PR.
Things to check each time you contribute:
Ctrl+Alt+L
shortcut. You may have reorganized imports as well.