IBM / mi-prometheus

Enabling reproducible Machine Learning research
http://mi-prometheus.rtfd.io/
Apache License 2.0
42 stars 18 forks source link

[Feature] Adding setup.py #15

Closed vmarois closed 6 years ago

vmarois commented 6 years ago

In this PR, I'm adding the setup.py script for the installation of mip. This should lower the entry bar for mi-prometheus ( python setup.py install) and let external users easily use it. It should also allow to upload the framework to PyPI (by creating a wheel with python setup.py sdist bdist_wheel - though I haven't tested it).

The main points are:

import_examples

It is working well, and for now, all models / problems are import-able directly from miprometheus.models / miprometheus.problems respectively. This is due to the fact that all classes (and then modules) are imported from the innermost __init__ to the outermost one. This may change in the future as it appears that having too many statements in the __init__s is not best practice: See here:

A commonly seen issue is to add too much code to init.py files. When the project complexity grows, there may be sub-packages and sub-sub-packages in a deep directory structure. In this case, importing a single item from a sub-sub-package will require executing all init.py files met while traversing the tree.

So I would support emptying the outermost __init__s, as this could avoid import issues, and force the import statements to reflect more closely the project structure. On a similar topic, I would also support simplifying the problems directory (especially seq_to_seq) as it is too nested for me. My argument is that the setup.py exposes miprometheus as a regular API to the user (like most packages out there). While I agree that this is not the main use-case of miprometheus (which is through the scripts, see point below), we still should maintain it as it may be how a user would test integrating a new problem / model in miprometheus (with first the use of a decorator @register and then cloning the repo to fully integrate it in the source code). For the others entities in miprometheus, it is not as critical as the directories aren't as nested. An import like from miprometheus.models.mac import MACNetwork is sufficiently short & clear to me.

mip-offlinetrainer

This is an example with mip-offlinetrainer --h and similarly, you'll have mip-gridtester-cpu etc. :warning: As of now, there is an error related to the paths of the default_config files (they are treated as absolute paths), which should be easily fixable. Also, the setup.py has the option to install datafiles:

package_data={ # 'mip.config': ['default_config.yaml'] } So perhaps that gives the option to list here some configs.yaml files that will be copied over with the data and thus accessible from everywhere. I'll look into that later on.

I have encountered some (easily fixable) errors, related to matplotlib, and the backend it uses (related to PyQt5). The unique class which causes this is TimePlot (and I strongly advocate that it gets refactored to address this + becomes more efficient).

'pytorch=0.4.0', # 0.4.1 not fully tested 'torchvision<0.2.0', # ran in some issues with 0.2.1 'torchtext', 'tensorboardX', 'matplotlib', 'numpy', 'PyYAML', 'tqdm', 'nltk', 'h5py', 'six', 'pyqt5==5.10.1' # ran into issues with higher versions

I recommend that you look at the setup.py to see which information are located here.

Feedback welcome!

Note:

vmarois commented 6 years ago

Closes #5

tkornuta-ibm commented 6 years ago

Thanks for that great pull request.

Generally, I think that: a) let's stay with the current content of the outermost init file for now, b) creating an issue regarding the absolute paths of configuration files.