MI-Prometheus (Machine Intelligence - Prometheus), an open-source framework aiming at accelerating Machine Learning Research, by fostering the rapid development of diverse neural network-based models and facilitating their comparison. In its core, to accelerate the computations on their own, MI-Prometheus relies on PyTorch and extensively uses its mechanisms for the distribution of computations on CPUs/GPUs.
In MI-Prometheus, the training & testing mechanisms are no longer pinned to a specific model or problem, and built-in mechanisms for easy configuration management & statistics collection facilitate running experiments combining different models with problems.
A project of the Machine Intelligence team, IBM Research, Almaden.
PyTorch is the main library used by MI-Prometheus for tensors computations. Please refer to the official installation guide for PyTorch to install it. We currently do not officially support PyTorch >= v0.4.1 (especially the v1.0 preview), but intend to in the near future.
The recommended install procedure below assumes the creation of a new Anaconda environment.
Install PyTorch 0.4.0.
With CUDA support:
conda install conda install pytorch=0.4.0 cuda90 -c pytorch # For CUDA 9
Or CPU only:
conda install pytorch-cpu=0.4.0 cpuonly -c pytorch
conda install pyyaml
python setup.py install
Or if you are the developer, please call the following command instead:
python setup.py develop
This will enable you to change the code of existing problems/models/workers and run them by calling mip-* commands. More in that subject can be found in the setuptools documentation.
We mainly develop on Ubuntu 16.04, but MI-Prometheus should work on macOS (10.14) as well.
We will upload MI-prometheus to PyPI in the near future.
The dependencies of MI-prometheus are:
The workers are the main way you will use MI-Prometheus. They are parameterizable, OOP-designed scripts which will execute a specific task related to the supervised training or test of a Model on a Problem, following a Configuration.
foo@bar:~$ mip-offline-trainer --h
usage: mip-offline-trainer [-h] [--config CONFIG] [--model MODEL] [--gpu]
[--outdir OUTDIR] [--savetag SAVETAG]
[--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
[--li LOGGING_INTERVAL] [--agree]
[--tensorboard {0,1,2}] [--visualize {-1,0,1,2,3}]
optional arguments:
-h, --help show this help message and exit
--config CONFIG Name of the configuration file(s) to be loaded. If specifying more than one file, they must be separated with coma ",".
--model MODEL Path to the file containing the saved parameters of the model to load (model checkpoint, should end with a .pt extension.)
--gpu The current worker will move the computations on GPU devices, if available in the system. (Default: False)
--outdir OUTDIR Path to the output directory where the experiment(s) folders will be stored. (DEFAULT: ./experiments)
--savetag SAVETAG Tag for the save directory
--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Log level. (Default: INFO)
--li LOGGING_INTERVAL
Statistics logging interval. Will impact logging to the logger and exporting to TensorBoard. Writing to the csv file is not impacted (interval of 1). (Default: 100, i.e. logs every 100 episodes).
--agree Request user confirmation just after loading the settings, before starting training (Default: False)
--tensorboard {0,1,2}
If present, enable logging to TensorBoard. Available log levels:
0: Log the collected statistics.
1: Add the histograms of the model's biases & weights (Warning: Slow).
2: Add the histograms of the model's biases & weights gradients (Warning: Even slower).
--visualize {-1,0,1,2,3}
Activate dynamic visualization (Warning: will require user interaction):
-1: disabled (DEFAULT)
0: Only during training episodes.
1: During both training and validation episodes.
2: Only during validation episodes.
3: Only during the last validation, after the training is completed.
foo@bar:~$ mip-online-trainer --h
usage: mip-online-trainer [-h] [--config CONFIG] [--model MODEL] [--gpu]
[--outdir OUTDIR] [--savetag SAVETAG]
[--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
[--li LOGGING_INTERVAL] [--agree]
[--tensorboard {0,1,2}] [--visualize {-1,0,1,2,3}]
optional arguments:
-h, --help show this help message and exit
--config CONFIG Name of the configuration file(s) to be loaded. If specifying more than one file, they must be separated with coma ",".
--model MODEL Path to the file containing the saved parameters of the model to load (model checkpoint, should end with a .pt extension.)
--gpu The current worker will move the computations on GPU devices, if available in the system. (Default: False)
--outdir OUTDIR Path to the output directory where the experiment(s) folders will be stored. (DEFAULT: ./experiments)
--savetag SAVETAG Tag for the save directory
--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Log level. (Default: INFO)
--li LOGGING_INTERVAL
Statistics logging interval. Will impact logging to the logger and exporting to TensorBoard. Writing to the csv file is not impacted (interval of 1). (Default: 100, i.e. logs every 100 episodes).
--agree Request user confirmation just after loading the settings, before starting training (Default: False)
--tensorboard {0,1,2}
If present, enable logging to TensorBoard. Available log levels:
0: Log the collected statistics.
1: Add the histograms of the model's biases & weights (Warning: Slow).
2: Add the histograms of the model's biases & weights gradients (Warning: Even slower).
--visualize {-1,0,1,2,3}
Activate dynamic visualization (Warning: will require user interaction):
-1: disabled (DEFAULT)
0: Only during training episodes.
1: During both training and validation episodes.
2: Only during validation episodes.
3: Only during the last validation, after the training is completed.
foo@bar:~$ mip-tester --h
usage: mip-tester [-h] [--config CONFIG] [--model MODEL] [--gpu]
[--outdir OUTDIR] [--savetag SAVETAG]
[--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
[--li LOGGING_INTERVAL] [--agree] [--visualize]
optional arguments:
-h, --help show this help message and exit
--config CONFIG Name of the configuration file(s) to be loaded. If specifying more than one file, they must be separated with coma ",".
--model MODEL Path to the file containing the saved parameters of the model to load (model checkpoint, should end with a .pt extension.)
--gpu The current worker will move the computations on GPU devices, if available in the system. (Default: False)
--outdir OUTDIR Path to the output directory where the experiment(s) folders will be stored. (DEFAULT: ./experiments)
--savetag SAVETAG Tag for the save directory
--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Log level. (Default: INFO)
--li LOGGING_INTERVAL
Statistics logging interval. Will impact logging to the logger and exporting to TensorBoard. Writing to the csv file is not impacted (interval of 1). (Default: 100, i.e. logs every 100 episodes).
--agree Request user confirmation just after loading the settings, before starting training (Default: False)
--visualize Activate dynamic visualization
Grid Workers manage several experiments ("grids") by reusing the workers, such as OfflineTrainer \& Tester. There are 3 types of Grid Workers:
GridTrainerCPU
) and one for GPUs (CUDA) (GridTrainerGPU
),GridTesterCPU
& GridTesterGPU
,mip-grid-analyzer, which summarizes the results of several trainings & tests into one csv file.
foo@bar:~$ mip-grid-trainer-cpu --h
usage: mip-grid-trainer-cpu [-h] [--outdir OUTDIR] [--savetag SAVETAG]
[--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
[--li LOGGING_INTERVAL] [--agree] [--config CONFIG]
[--online_trainer] [--tensorboard {0,1,2}]
optional arguments:
-h, --help show this help message and exit
--outdir OUTDIR Path to the global output directory where the experiments folders will be / are stored. Affects all grid experiments. (DEFAULT: ./experiments)
--savetag SAVETAG Additional tag for the global output directory.
--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Log level for the experiments. (Default: INFO)
--li LOGGING_INTERVAL
Statistics logging interval. Will impact logging to the logger and exporting to TensorBoard for the experiments. Do not affect the grid worker. Writing to the csv file is not impacted (interval of 1). (Default: 100, i.e. logs every 100 episodes).
--agree Request user confirmation before starting the grid experiment. (Default: False)
--config CONFIG Name of the configuration file(s) to be loaded. If specifying more than one file, they must be separated with coma ",".
--online_trainer Select the OnLineTrainer instead of the default OffLineTrainer.
--tensorboard {0,1,2}
If present, enable logging to TensorBoard. Available log levels:
0: Log the collected statistics.
1: Add the histograms of the model's biases & weights (Warning: Slow).
2: Add the histograms of the model's biases & weights gradients (Warning: Even slower).
foo@bar:~$ mip-grid-tester-cpu --h
usage: mip-grid-tester-cpu [-h] [--outdir OUTDIR] [--savetag SAVETAG]
[--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
[--li LOGGING_INTERVAL] [--agree] [--n NUM_TESTS]
optional arguments:
-h, --help show this help message and exit
--outdir OUTDIR Path to the global output directory where the experiments folders will be / are stored. Affects all grid experiments. (DEFAULT: ./experiments)
--savetag SAVETAG Additional tag for the global output directory.
--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Log level for the experiments. (Default: INFO)
--li LOGGING_INTERVAL
Statistics logging interval. Will impact logging to the logger and exporting to TensorBoard for the experiments. Do not affect the grid worker. Writing to the csv file is not impacted (interval of 1). (Default: 100, i.e. logs every 100 episodes).
--agree Request user confirmation before starting the grid experiment. (Default: False)
--n NUM_TESTS Number of test experiments to run for each model.
foo@bar:~$ mip-grid-analyzer --h
usage: mip-grid-analyzer [-h] [--expdir EXPDIR]
[--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
[--li LOGGING_INTERVAL] [--agree]
optional arguments:
-h, --help show this help message and exit
--expdir EXPDIR Path to the directory where the experiments folders will be / are stored. Affects all grid experiments. (DEFAULT: ./experiments)
--ll {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Log level for the experiments. (Default: INFO)
--li LOGGING_INTERVAL
Statistics logging interval. Will impact logging to the logger and exporting to TensorBoard for the experiments. Do not affect the grid worker itself. Writing to the csv file is not impacted (interval of 1). (Default: 100, i.e. logs every 100 episodes).
--agree Request user confirmation before starting the grid experiment. (Default: False)
NOTES:
Documentation is created using Sphinx
, and is available on readthedocs.io.
You are encouraged if you would like to contribute! Please use the issues if you want to request a new feature or a fix, so that we can discuss it first.