inakleinbottle commented 3 years ago

This package certainly has potential, but there are quite a few issues that need to be addressed before I think it is ready for publication in JOSS. Starting from the top.

General

Substantial scholarly effort: I have no problem with the contribution you have made to the field and the functionality of the package. However, I feel you need to stress the differences between your package and the functionality available from Scikit-Learn in the paper. Indeed, in the paper you state "The prediction process for regression is not available in scikit-learn and, thus, will be provided by gmr." To what regression are you referring? The GaussianMixture class has a predict method which does, by my reckoning, perform some kind of prediction. Crucially, this "prediction" is different from the "prediction" provided by the GMM class in gmr. In the paper, you should elaborate on the difference between regression processes used here and clearly state what is in your package but not in Scikit-Learn or other packages.

Functionality

Installation. My first attempt to install the package failed because the dependencies are incorrectly set setup.py. Dependencies that are required for the installation and running of your package should be listed under the install_requires keyword argument of setup rather than requires. I think you need 2 dependencies for this package: numpy and scipy. You might also have a dependency on matplotlib, since a number of functions require matplotlib. Scikit-learn is also required for some examples.
Functionality. The package appears to perform as claimed, though I have yet to test an example of my construction. I will make a follow-up post on this issue if I cannot get this to work. The code generally looks to be well written and you have plenty of examples to illustrate the functionality. However, one run of your test suite failed and I'm not sure why. Reinstalling gmr with the local package seemed to fix the issue (pip install -e .). You might want to look into this. Test output below. The second run passed without issue.

Documentation

Statement of need. Your statement of need indicates the problems that your package is designed to solve, but could be a bit more clear about who the audience is. This need not be much, just a sentence to give some suggestions of what kind of people might use this software.
Installation instructions. The dependencies are not listed correctly in setup.py, see above.
Example usage. You give lots of examples, but they all seem a bit manufactured. There is a version of the Iris classification problem from Scikit-Learn, which is a great example, but is not explained and it doesn't appear to do a particularly good job at modelling the data.
Functionality documentation. You need to generate API documentation for your package. All of the functions have documentation strings in the code, but users probably don't want to have to read through the code to find the functions they need. You should compile some more extensive documentation that covers the functions available to the end user on each class, and illustrate their use.
Community guidelines. You need to add some guidance for people who want to contribute, report issues, or get support on your package. This can be as simple as pointing them to the relevant features of Github (issues, pull-requests, etc.) Dan Bader has a nice article on what kind of things to write in a readme: [https://dbader.org/blog/write-a-great-readme-for-your-github-project]()

Software Paper

Summary is fine, but you misspelled Gaussian.
Statement of need. See above.
State of the field. As I mentioned above, I think you need to be a little more clear on how this differs from the features available in Scikit-Learn.

Test data

I'm not sure how, but my first run of the test failed. Reinstalling using pip install -e . seemed to fix this. It might be an error from my virtual environment, but you might want to look into this.

============================= test session starts ==============================
platform linux -- Python 3.8.7+, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /home/sam/tmp/gmr
collected 49 items                                                             

gmr/tests/test_gmm.py ............................FFF                    [ 63%]
gmr/tests/test_mvn.py ..................                                 [100%]

=================================== FAILURES ===================================
________________________ test_extract_mvn_negative_idx _________________________

    def test_extract_mvn_negative_idx():
        gmm = GMM(n_components=2, priors=0.5 * np.ones(2), means=np.zeros((2, 2)),
                  covariances=[np.eye(2)] * 2)
>       assert_raises(ValueError, gmm.extract_mvn, -1)
E       AttributeError: 'GMM' object has no attribute 'extract_mvn'

gmr/tests/test_gmm.py:427: AttributeError
________________________ test_extract_mvn_idx_too_high _________________________

    def test_extract_mvn_idx_too_high():
        gmm = GMM(n_components=2, priors=0.5 * np.ones(2), means=np.zeros((2, 2)),
                  covariances=[np.eye(2)] * 2)
>       assert_raises(ValueError, gmm.extract_mvn, 2)
E       AttributeError: 'GMM' object has no attribute 'extract_mvn'

gmr/tests/test_gmm.py:433: AttributeError
______________________________ test_extract_mvns _______________________________

    def test_extract_mvns():
        gmm = GMM(n_components=2, priors=0.5 * np.ones(2),
                  means=np.array([[1, 2], [3, 4]]), covariances=[np.eye(2)] * 2)
>       mvn0 = gmm.extract_mvn(0)
E       AttributeError: 'GMM' object has no attribute 'extract_mvn'

gmr/tests/test_gmm.py:439: AttributeError
=============================== warnings summary ===============================
../venv/lib/python3.8/site-packages/nose/importer.py:12
  /home/sam/tmp/venv/lib/python3.8/site-packages/nose/importer.py:12: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    from imp import find_module, load_module, acquire_lock, release_lock

gmr/tests/test_gmm.py: 12 warnings
  /home/sam/tmp/venv/lib/python3.8/site-packages/gmr/gmm.py:175: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    dtype=np.float) / self.n_components

gmr/tests/test_gmm.py: 1204 warnings
gmr/tests/test_mvn.py: 5 warnings
  /home/sam/tmp/venv/lib/python3.8/site-packages/gmr/mvn.py:8: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    inv = np.ones(n_features, dtype=np.bool)

gmr/tests/test_mvn.py::test_unscented_transform_linear_transformation
gmr/tests/test_mvn.py::test_unscented_transform_linear_combination
gmr/tests/test_mvn.py::test_unscented_transform_projection_to_more_dimensions
gmr/tests/test_mvn.py::test_unscented_transform_quadratic
  /home/sam/tmp/venv/lib/python3.8/site-packages/gmr/mvn.py:316: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    D = np.maximum(D, np.finfo(np.float).eps)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED gmr/tests/test_gmm.py::test_extract_mvn_negative_idx - AttributeError:...
FAILED gmr/tests/test_gmm.py::test_extract_mvn_idx_too_high - AttributeError:...
FAILED gmr/tests/test_gmm.py::test_extract_mvns - AttributeError: 'GMM' objec...
================= 3 failed, 46 passed, 1226 warnings in 5.11s ==================

vasudev-sharma commented 3 years ago

Thanks, @inakleinbottle for creating an issue. I was also going to open an issue that had some common points that you mentioned to which I definitely agree. Referencing this issue to the JOSS review issue.

AlexanderFabisch commented 3 years ago

Thanks for the review. That's really helpful for me. I add my changes to #26 and I will comment on your comments here in the next days.

AlexanderFabisch commented 3 years ago

I'm not sure how, but my first run of the test failed. Reinstalling using pip install -e . seemed to fix this. It might be an error from my virtual environment, but you might want to look into this.

The problem is that GMM.extract_mvn is not yet available in version 1.4 at PyPI. I will make a new release in the next days, Sorry for the inconveniences.

AlexanderFabisch commented 3 years ago

Substantial scholarly effort: I have no problem with the contribution you have made to the field and the functionality of the package. However, I feel you need to stress the differences between your package and the functionality available from Scikit-Learn in the paper. Indeed, in the paper you state "The prediction process for regression is not available in scikit-learn and, thus, will be provided by gmr." To what regression are you referring? The GaussianMixture class has a predict method which does, by my reckoning, perform some kind of prediction. Crucially, this "prediction" is different from the "prediction" provided by the GMM class in gmr. In the paper, you should elaborate on the difference between regression processes used here and clearly state what is in your package but not in Scikit-Learn or other packages.

In #26, I added the following lines:

Note that while sklearn's the function GaussianMixture.predict exists, it does not perform regression. This function computes the index of the Gaussian for which the inputs have the highest probability. Furthermore, multi-modal regression often requires a more complicated interface to extract not just the mean but also individual Gaussians or sample from the predicted distribution.

Functionality

Installation. My first attempt to install the package failed because the dependencies are incorrectly set setup.py. Dependencies that are required for the installation and running of your package should be listed under the install_requires keyword argument of setup rather than requires. I think you need 2 dependencies for this package: numpy and scipy. You might also have a dependency on matplotlib, since a number of functions require matplotlib. Scikit-learn is also required for some examples.

This should be fixed with #26, which

Changes requires to install_requires in setup.py
Adds optional requirements for examples, tests, and documentation

Functionality. The package appears to perform as claimed, though I have yet to test an example of my construction. I will make a follow-up post on this issue if I cannot get this to work. The code generally looks to be well written and you have plenty of examples to illustrate the functionality. However, one run of your test suite failed and I'm not sure why. Reinstalling gmr with the local package seemed to fix the issue (pip install -e .). You might want to look into this. Test output below. The second run passed without issue.

See my last comment.

Documentation

Statement of need. Your statement of need indicates the problems that your package is designed to solve, but could be a bit more clear about who the audience is. This need not be much, just a sentence to give some suggestions of what kind of people might use this software.

I don't know if I understood correctly what you mean by "audience". I added the following paragraph:

Multimodal regression and Gaussian mixture regression has been used mostly by the robotics community. For example, inverse problems such as inverse kinematics cannot be easily modeled with standard regression approaches. Furthermore, Gaussian mixture regression is the basis of many programming by demonstration approaches [@Billard2008].

Example usage. You give lots of examples, but they all seem a bit manufactured. There is a version of the Iris classification problem from Scikit-Learn, which is a great example, but is not explained and it doesn't appear to do a particularly good job at modelling the data.

I added this to the docstring of plot_iris_from_sklearn.py:

This is just for demonstration purposes and does not represent an example of a particularly good fit. Take a look at plot_iris.py for a fit with full covariances.

Many examples are a bit simplistic. It is actually quite difficult to find good, publicly available "real" datasets to demonstrate the library with a more advanced example. I can maybe try to push one of my colleagues to publish his dataset of bimanual robotic manipulation trajectories, which demonstrates quite nicely how we can use GMR, but this might take some time.

Functionality documentation. You need to generate API documentation for your package. All of the functions have documentation strings in the code, but users probably don't want to have to read through the code to find the functions they need. You should compile some more extensive documentation that covers the functions available to the end user on each class, and illustrate their use.

Thanks for pushing me in this direction. I didn't want to invest too much effort in configuring a complicated sphinx documentation. I found a nice tool with almost no configuration overhead: pdoc3. The API documentation is now available here: https://alexanderfabisch.github.io/gmr/

Community guidelines. You need to add some guidance for people who want to contribute, report issues, or get support on your package. This can be as simple as pointing them to the relevant features of Github (issues, pull-requests, etc.) Dan Bader has a nice article on what kind of things to write in a readme: https://dbader.org/blog/write-a-great-readme-for-your-github-project

That's a quite nice guide. I think most of the things where already in the readme. I added a section about how to contribute and restructured the readme.

Software Paper

* Summary is fine, but you misspelled Gaussian.

Ah yes, thank you.

inakleinbottle commented 3 years ago

Sphinx is actually very easy to use, and all the complication is in the first setup. You have already done the hard work with writing the documentation strings. However, I'm not saying you must use sphinx, use whatever tool gets the job done.

AlexanderFabisch commented 3 years ago

Sphinx is actually very easy to use, and all the complication is in the first setup. You have already done the hard work with writing the documentation strings. However, I'm not saying you must use sphinx, use whatever tool gets the job done.

I agree and I like sphinx, but I personally sometimes spend too much time fine-tuning the documentation.

I released 1.5, which should fix the errors in the unit tests, and updated the paper with the your suggestions: https://github.com/openjournals/joss-reviews/issues/3054

inakleinbottle commented 3 years ago

@AlexanderFabisch Your API documentation looks really good, I'm going to have a look at the tool you used. When you have some time, I'd look to flesh out some of the descriptions of the classes and functions. However, I'm satisfied that it meets the requirements for JOSS.

You've also done a stellar job sorting out the install requirements. I'm still having having problems installing the package in a fresh environment, which I think is because you import gmr in your setup script, which happens before Numpy is installed. You might be able to solve this problem by putting your version in a separate file within the gmr directory. Something like __version__.py, which you can import in __init__.py to get the version into the package number. This file could set the variable __version__. Then in your setup.py function script, you can exec this file instead of importing it, which will circumvent the Python import mechanics entirely and avoid the numpy imports from the main package. (You could use a function like this.)

def get_version():
    namespace = {}
    with open("gmr/__version__.py") as fp:
        exec(fp.read(), namespace)
    return namespace["__version__"]

I don't know if this is the best solution, but it will avoid the problems that occur at the moment. This might be overkill, but it will make sure.

Your instructions for contributing look good, I can tick this off now.

You've addressed the functionality comments that I made, so I'm happy to tick these off now.

I'm satisfied that you have addressed the differences between gmr and scikit-learn.

The only issue that is outstanding from my part is the installation process.

AlexanderFabisch commented 3 years ago

@AlexanderFabisch Your API documentation looks really good, I'm going to have a look at the tool you used. When you have some time, I'd look to flesh out some of the descriptions of the classes and functions.

Thanks, and yes, let me know if you have any suggestions.

I'm still having having problems installing the package in a fresh environment, which I think is because you import gmr in your setup script, which happens before Numpy is installed.

That's right. I didn't think about it. I just releast 1.5.1, which should fix it. At least it worked for me in a fresh docker container. I used sklearn's approach here:

Great feedback so far. Thank you!

inakleinbottle commented 3 years ago

@AlexanderFabisch I just tested the install again from PyPI and it seems to be working now. I must have got the previous version when I ran it last. Happy to tick this one off now.

AlexanderFabisch commented 3 years ago

Thanks @inakleinbottle . Your review helped me a lot. In particular, the documentation is much better now.

AlexanderFabisch / gmr

Review issue for JOSS #25

General

Functionality

Documentation

Software Paper

Test data

Functionality

Documentation

Software Paper