[Feature] Adding setup.py

vmarois commented 6 years ago

In this PR, I'm adding the setup.py script for the installation of mip. This should lower the entry bar for mi-prometheus ( python setup.py install) and let external users easily use it. It should also allow to upload the framework to PyPI (by creating a wheel with python setup.py sdist bdist_wheel - though I haven't tested it).

The main points are:

Moved workers/, problems/, models/, utils/ into a directory miprometheus/. This is a major change which impacts the structure of mip as a Python package. I did this as the use of the setup.py authorizes the user to import the packages & modules of Mi-Prometheus. They are now all under miprometheus/, which enables the user to simply import miprometheus as mip to have access to all classes in his workspace. This is a screenshot of what the user would be able to do:

import_examples

It is working well, and for now, all models / problems are import-able directly from miprometheus.models / miprometheus.problems respectively. This is due to the fact that all classes (and then modules) are imported from the innermost __init__ to the outermost one. This may change in the future as it appears that having too many statements in the __init__s is not best practice: See here:

A commonly seen issue is to add too much code to init.py files. When the project complexity grows, there may be sub-packages and sub-sub-packages in a deep directory structure. In this case, importing a single item from a sub-sub-package will require executing all init.py files met while traversing the tree.

So I would support emptying the outermost __init__s, as this could avoid import issues, and force the import statements to reflect more closely the project structure. On a similar topic, I would also support simplifying the problems directory (especially seq_to_seq) as it is too nested for me. My argument is that the setup.py exposes miprometheus as a regular API to the user (like most packages out there). While I agree that this is not the main use-case of miprometheus (which is through the scripts, see point below), we still should maintain it as it may be how a user would test integrating a new problem / model in miprometheus (with first the use of a decorator @register and then cloning the repo to fully integrate it in the source code). For the others entities in miprometheus, it is not as critical as the directories aren't as nested. An import like from miprometheus.models.mac import MACNetwork is sufficiently short & clear to me.

The script also specifies the entry points of the workers so that the user can now use them from the command line in a simple way:

mip-offlinetrainer

This is an example with mip-offlinetrainer --h and similarly, you'll have mip-gridtester-cpu etc. :warning: As of now, there is an error related to the paths of the default_config files (they are treated as absolute paths), which should be easily fixable. Also, the setup.py has the option to install datafiles:

package_data={ # 'mip.config': ['default_config.yaml'] } So perhaps that gives the option to list here some configs.yaml files that will be copied over with the data and thus accessible from everywhere. I'll look into that later on.

I have tested the python setup.py install command using a Docker image of Ubuntu 16.04. One main thing is that we should not use pytorch from pip, as it causes errors that shouldn't appear (I have seen this on others python projects as well). So installing pytorch from miniconda and then using the setup.py to install the rest of the dependencies works and does not cause issues. I have tested a run of MAES on SerialRecall & a run of SimpleConvNet on MNIST, which both converged and were tested with mip-tester.

I have encountered some (easily fixable) errors, related to matplotlib, and the backend it uses (related to PyQt5). The unique class which causes this is TimePlot (and I strongly advocate that it gets refactored to address this + becomes more efficient).

I have also reduced the number of dependencies of mip. Mainly I removed progressbar2 & pandas + the call to opening logger_config.yaml in the base workers classes (the content of this file is now directly a dict variable contained in the classes that use it, Worker & GridWorker). The minimal dependencies needed to use mip are:

'pytorch=0.4.0', # 0.4.1 not fully tested 'torchvision<0.2.0', # ran in some issues with 0.2.1 'torchtext', 'tensorboardX', 'matplotlib', 'numpy', 'PyYAML', 'tqdm', 'nltk', 'h5py', 'six', 'pyqt5==5.10.1' # ran into issues with higher versions

I recommend that you look at the setup.py to see which information are located here.

Feedback welcome!

Note:

I have only tested the script with Python3.6 .
I have included the readthedocs.yml file needed to build the doc, but as of now, readthedocs.io is still affected by the GitHub outage. So waiting for it to be back to normal to test the doc build.

vmarois commented 6 years ago

Closes #5

tkornuta-ibm commented 6 years ago

Thanks for that great pull request.

Generally, I think that: a) let's stay with the current content of the outermost init file for now, b) creating an issue regarding the absolute paths of configuration files.

IBM / mi-prometheus

[Feature] Adding setup.py #15