GeoStat-Framework / pentapy

A Python toolbox for pentadiagonal linear systems
https://pentapy.readthedocs.io/
MIT License
14 stars 4 forks source link

JOSS Review issues #1

Closed inakleinbottle closed 5 years ago

inakleinbottle commented 5 years ago

Firstly, I think this is a nicely written library and the code was very easy to read.

I do have some comments, some of which I think need to be addressed as part of the review and some that I think might be nice but are not essential.

I will start with the points that I think need to be addressed as part of the review process:

import numpy as np from numpy.testing import assert_almost_equal

import pentapy as pp

class TestMuGammaNonZeroExampleFromPaper(unittest.TestCase): """Example taken from https://www.hindawi.com/journals/mpe/2015/232456/""" def setUp(self): self.mat = np.array([ [1, 2, 1, 0, 0, 0, 0, 0, 0, 0], [3, 2, 2, 5, 0, 0, 0, 0, 0, 0], [1, 2, 3, 1, -2, 0, 0, 0, 0, 0], [0, 3, 1, -4, 5, 1, 0, 0, 0, 0], [0, 0, 1, 2, 5, -7, 5, 0, 0, 0], [0, 0, 0, 5, 1, 6, 3, 2, 0, 0], [0, 0, 0, 0, 2, 2, 7, -1, 4, 0], [0, 0, 0, 0, 0, 2, 1, -1, 4, -3], [0, 0, 0, 0, 0, 0, 2, -2, 1, 5], [0, 0, 0, 0, 0, 0, 0, -1, 4, 8] ]) self.vec = np.array([ 8, 33, 8, 24, 29, 98, 99, 17, 57, 108 ]) self.expected = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

def test_pp1(self):
    sol = pp.solve(self.mat, self.vec)
    assert_almost_equal(sol, self.expected)

def test_pp2(self):
    sol = pp.solve(self.mat, self.vec, solver=2)
    assert_almost_equal(sol, self.expected)

def test_validity(self):
    sol = self.mat @ self.expected.reshape((10, 1))
    assert_almost_equal(sol.reshape((10, )), self.vec)
- I think the code should warn the user if the solver encounters a case where the method fails (where mu_i, psi_i is zero, for instance) mentioned above, either by raising an appropriate exception or at least a warning. This may be rather tricky with the Cython code, but it is something to think about.
- I notice that you have documentation hosted on [readthedocs](https://geostat-framework.readthedocs.io/projects/pentapy/en/latest/) but I had to search for this separately. I think you should include a link in the project README, and, if possible, also on the PyPI listing for the project.
- The documentation should include a few comments about what the difference is between the algorithms PTRANS-I and PTRANS-II. (I looked and failed to find any difference in the cases where these can be applied.)
- Your code can be optimised by having Cython installed, but there doesn't seem to be any instruction to that effect in the README. (Note that without the Cython speed-up the performance is inferior to the standard sparse scipy solvers.) Cython should probably be listed as an optional dependency.
- Similar to the above, this library provides an extra interface to scipy solvers, but scipy is not listed as an (optional) dependency. This should be fixed, at least in the documentation. Perfplot is also used in one of the examples, so it should probably be listed somewhere.
- Your documentation should include instructions on how one can contribute to the project and how to report issues (I presume this should be done via Github issues). I suggest something like what is included in [this template](https://github.com/dbader/readme-template).
- You might need to include references (at the end of the paper) to the packages mentioned in your paper: numpy, scipy, Cython, perfplot. This needs to be clarified by the JOSS editors.
- I think the README needs to be expanded, possibly including some of the introduction made in your paper, that explains exactly what and who this package is for. 

In addition to the above, I suggest some minor issues that I think will improve the code:
- In core.py, you try to import the Cython solver, and if this fails then you fall back to the Python solver. However, you warn the user that this has happened by means of a print statement. This should probably be done by means of a warning. Perhaps something like
```python3
import warnings

warnings.warn("Could not import the Cython solver, falling back to Python solver", ImportWarning)

I am happy to discuss any of these points, just respond to this issue.

MuellerSeb commented 5 years ago

Hey @inakleinbottle,

Thank you for this detailed review. I will try to address all your points in the following.

Firstly, I think this is a nicely written library and the code was very easy to read.

Thank you!

I do have some comments, some of which I think need to be addressed as part of the review and some that I think might be nice but are not essential.

I will start with the points that I think need to be addressed as part of the review process:

* I believe the test suite could be more comprehensive and check some of the corner cases that arise, for example, when one of the mu_i or psi_i are zero. I also suggest using the illustrative examples from the cited paper as examples/tests. I wrote this test case up to experiment with the code:
import unittest

import numpy as np
from numpy.testing import assert_almost_equal

import pentapy as pp

class TestMuGammaNonZeroExampleFromPaper(unittest.TestCase):
   """Example taken from https://www.hindawi.com/journals/mpe/2015/232456/"""
    def setUp(self):
        self.mat = np.array([
            [1, 2, 1,  0,  0,  0, 0,  0, 0,  0],
            [3, 2, 2,  5,  0,  0, 0,  0, 0,  0],
            [1, 2, 3,  1, -2,  0, 0,  0, 0,  0],
            [0, 3, 1, -4,  5,  1, 0,  0, 0,  0],
            [0, 0, 1,  2,  5, -7, 5,  0, 0,  0],
            [0, 0, 0,  5,  1,  6, 3,  2, 0,  0],
            [0, 0, 0,  0,  2,  2, 7, -1, 4,  0],
            [0, 0, 0,  0,  0,  2, 1, -1, 4, -3],
            [0, 0, 0,  0,  0,  0, 2, -2, 1,  5],
            [0, 0, 0,  0,  0,  0, 0, -1, 4,  8]
        ])
        self.vec = np.array([
            8, 33, 8, 24, 29, 98, 99, 17, 57, 108
        ])
        self.expected = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

    def test_pp1(self):
        sol = pp.solve(self.mat, self.vec)
        assert_almost_equal(sol, self.expected)

    def test_pp2(self):
        sol = pp.solve(self.mat, self.vec, solver=2)
        assert_almost_equal(sol, self.expected)

    def test_validity(self):
        sol = self.mat @ self.expected.reshape((10, 1))
        assert_almost_equal(sol.reshape((10, )), self.vec)

You are totally right about the corner cases! The example given in equation (22) in https://www.hindawi.com/journals/mpe/2015/232456/#EEq4.2 could serve as such a case.

I think the example you proposed would fit more as an real example, than a test-case, since it is not special in any way. But I will add equation (22) as mentioned above.

Done with: https://github.com/GeoStat-Framework/pentapy/commit/e75e99d8787c3201bc663e38959842b16ab6b0d9

* I think the code should warn the user if the solver encounters a case where the method fails (where mu_i, psi_i is zero, for instance) mentioned above, either by raising an appropriate exception or at least a warning. This may be rather tricky with the Cython code, but it is something to think about.

You are absolutely right. I will catch ZeroDivisionErrors in that case and raise a warning, that the chosen algorithm is not suitable for the given matrix.

Done with: https://github.com/GeoStat-Framework/pentapy/commit/f5faf19f248bd0298b1038b3d96349c9c84c3b8a

* I notice that you have documentation hosted on [readthedocs](https://geostat-framework.readthedocs.io/projects/pentapy/en/latest/) but I had to search for this separately. I think you should include a link in the project README, and, if possible, also on the PyPI listing for the project.

A common way to reference the documentation is the rtd-badge, which is placed in the README header. When you click on it, you are redirected to the documentation. One could argue if this is a standard approach. But I will add an explicit link to the doc.

Done with: https://github.com/GeoStat-Framework/pentapy/commit/7f3695dd56c4ae81edea41ec7b4a1c51b35cf1f0

* The documentation should include a few comments about what the difference is between the algorithms PTRANS-I and PTRANS-II. (I looked and failed to find any difference in the cases where these can be applied.)

Their difference was even not addressed in the paper. I think, they are two equal approaches in solving pentadiagonal systems. I don't think, that there are special cases, where one algorithm outperforms the other.

* Your code can be optimised by having Cython installed, but there doesn't seem to be any instruction to that effect in the README. (Note that without the Cython speed-up the performance is inferior to the standard sparse scipy solvers.) Cython should probably be listed as an optional dependency.

We provide pre-build wheels for most systems and Python versions, where the cython-moduls are already compiled, so the user doesn't have to care about this at all. But you are right, when it is the case that someone tries to install pentapy on a system, where we don't provide wheels, we could add a hint, that there should be cython, numpy and a c-compiler pre installed.

Done with: https://github.com/GeoStat-Framework/pentapy/commit/7f3695dd56c4ae81edea41ec7b4a1c51b35cf1f0

* Similar to the above, this library provides an extra interface to scipy solvers, but scipy is not listed as an (optional) dependency. This should be fixed, at least in the documentation. Perfplot is also used in one of the examples, so it should probably be listed somewhere.

I can add an extra-requirement in the setup.py, which enables installing scipy on the fly if wanted by the user. For the perfplot example, I will add a hint in the example doc-string.

Done with: https://github.com/GeoStat-Framework/pentapy/commit/8a8beb73c6d15eaa519520c5500628539b6fb658

* Your documentation should include instructions on how one can contribute to the project and how to report issues (I presume this should be done via Github issues). I suggest something like what is included in [this template](https://github.com/dbader/readme-template).

Thanks for this hint! I will add a CONTRIBUTING.md to the base folder like this one: https://github.com/GeoStat-Framework/GSTools/blob/master/CONTRIBUTING.md

Done with: https://github.com/GeoStat-Framework/pentapy/commit/c9e6bf8f4b1b333ccb22eca9509870fca135906d

* You might need to include references (at the end of the paper) to the packages mentioned in your paper: numpy, scipy, Cython, perfplot. This needs to be clarified by the JOSS editors.

Let's see what the editor says about that. After a quick look at other JOSS papers, I don't think it is a must.

* I think the README needs to be expanded, possibly including some of the introduction made in your paper, that explains exactly what and who this package is for.

You are right about that. I will do add some lines from the paper introduction.

Done with: https://github.com/GeoStat-Framework/pentapy/commit/7f3695dd56c4ae81edea41ec7b4a1c51b35cf1f0

In addition to the above, I suggest some minor issues that I think will improve the code:

* In core.py, you try to import the Cython solver, and if this fails then you fall back to the Python solver. However, you warn the user that this has happened by means of a print statement. This should probably be done by means of a warning. Perhaps something like
import warnings

warnings.warn("Could not import the Cython solver, falling back to Python solver", ImportWarning)

This is a nice hint, that also enables the warnings handling infrastructure.

Done with: https://github.com/GeoStat-Framework/pentapy/commit/f5faf19f248bd0298b1038b3d96349c9c84c3b8a

* Currently you pass an (optional) integer argument to the solve function. I suggest providing a `Solvers` enum that wraps the possible solvers. This will make the code easier to read and more explicit about which solver is which (without having to read the documentation to find the number to use). For example,
from enum import IntEnum
class Solvers(IntEnum):
    PTRANS_I = 1
    PTRANS_II = 2
    LAPACK_BANDED = 3
    SP_SOLVE = 4
    SP_SOLVE_UMF = 5

Since IntEnum is a subclass of int, you wouldn't have to change any of your other code.

You can also pass Strings containing the algorithm name like solver="PTRANS-I". On the one hand this is redundant but maybe it could increase the readability of scripts using pentapy.

* Generally the code is well documented, although the examples included could perhaps have some extra comments/doc-strings to help the users understand what is happening in the code.

I will add a doc-string to the examples.

Done with: https://github.com/GeoStat-Framework/pentapy/commit/cc2ffcc79a7b68b356cf35e73d83b35c67f9ca1d

* I found a typo in your acknowledgements in the paper: an "e" is missing from "Framwork".

Thanks for the sharp eye.

Done with: https://github.com/GeoStat-Framework/pentapy/commit/2e0cd8c29503f28aa8f9fa58fa964744f5c9d0c2

I am happy to discuss any of these points, just respond to this issue.

I hope my proposed changes are convincing.

Sebastian.

inakleinbottle commented 5 years ago

Thank you for addressing these points. I've updated the review issue. I have two points outstanding, one of which just needs some clarification from the editor (citations for packages mentioned in the paper).

MuellerSeb commented 5 years ago

Thanks again!