MolarVerse / PQAnalysis

PQAnalysis is a API/CLI python package for the analysis of MD simulations
https://molarverse.github.io/PQAnalysis/
MIT License
4 stars 2 forks source link

Feature/traj_isclose #74

Closed galjos closed 1 month ago

galjos commented 1 month ago

Added isclose member function to Trajectory, AtomicSystem and Cell.

__eq__ now calls isclose with default arguments.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 80.72%. Comparing base (caeddba) to head (f9fa8c8). Report is 8 commits behind head on dev.

:exclamation: Current head f9fa8c8 differs from pull request most recent head 48668aa

Please upload reports for the commit 48668aa to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #74 +/- ## ========================================== + Coverage 80.57% 80.72% +0.14% ========================================== Files 122 123 +1 Lines 4901 4944 +43 ========================================== + Hits 3949 3991 +42 - Misses 952 953 +1 ``` | [Flag](https://app.codecov.io/gh/MolarVerse/PQAnalysis/pull/74/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MolarVerse) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/MolarVerse/PQAnalysis/pull/74/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MolarVerse) | `80.72% <100.00%> (+0.14%)` | :arrow_up: | | [Files](https://app.codecov.io/gh/MolarVerse/PQAnalysis/pull/74?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MolarVerse) | Coverage Δ | | |---|---|---| | [PQAnalysis/atomic\_system/atomic\_system.py](https://app.codecov.io/gh/MolarVerse/PQAnalysis/pull/74?src=pr&el=tree&filepath=PQAnalysis%2Fatomic_system%2Fatomic_system.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MolarVerse#diff-UFFBbmFseXNpcy9hdG9taWNfc3lzdGVtL2F0b21pY19zeXN0ZW0ucHk=) | `73.96% <100.00%> (+0.63%)` | :arrow_up: | | [PQAnalysis/core/cell/cell.py](https://app.codecov.io/gh/MolarVerse/PQAnalysis/pull/74?src=pr&el=tree&filepath=PQAnalysis%2Fcore%2Fcell%2Fcell.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MolarVerse#diff-UFFBbmFseXNpcy9jb3JlL2NlbGwvY2VsbC5weQ==) | `100.00% <100.00%> (ø)` | | | [PQAnalysis/traj/trajectory.py](https://app.codecov.io/gh/MolarVerse/PQAnalysis/pull/74?src=pr&el=tree&filepath=PQAnalysis%2Ftraj%2Ftrajectory.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MolarVerse#diff-UFFBbmFseXNpcy90cmFqL3RyYWplY3RvcnkucHk=) | `99.14% <100.00%> (+0.01%)` | :arrow_up: | | [PQAnalysis/types.py](https://app.codecov.io/gh/MolarVerse/PQAnalysis/pull/74?src=pr&el=tree&filepath=PQAnalysis%2Ftypes.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MolarVerse#diff-UFFBbmFseXNpcy90eXBlcy5weQ==) | `100.00% <100.00%> (ø)` | | | [PQAnalysis/utils/math.py](https://app.codecov.io/gh/MolarVerse/PQAnalysis/pull/74?src=pr&el=tree&filepath=PQAnalysis%2Futils%2Fmath.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MolarVerse#diff-UFFBbmFseXNpcy91dGlscy9tYXRoLnB5) | `100.00% <100.00%> (ø)` | | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/MolarVerse/PQAnalysis/pull/74/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MolarVerse)
97gamjak commented 1 month ago

what happens if both atol and rtol are given?

97gamjak commented 1 month ago

what happens if both atol and rtol are given?

Or to put it in a different way. The numpy implementation has two default values - we use the same ones as numpy.isclose.

But how do they handle these values? Is it always the stricter value or the other one?

galjos commented 1 month ago

np.isclose is a or between rtol and atol. So, it only gives False if both are False.

galjos commented 1 month ago

And, since we had np.isclose already implemented with with the default values, there is no real change in functionality.

97gamjak commented 1 month ago

So actually I had a look at the numpy documentation. I think we should really reconsider at least the eq implementation:

For finite values, isclose uses the following equation to test whether two floating point values are equivalent.

absolute(a - b) <= (atol + rtol * absolute(b))

Unlike the built-in math.isclose, the above equation is not symmetric in a and b – it assumes b is the reference value – so that isclose(a, b) might be different from isclose(b, a). Furthermore, the default value of atol is not zero, and is used to determine what small values should be considered close to zero. The default value is appropriate for expected values of order unity: if the expected values are significantly smaller than one, it can result in false positives. atol should be carefully selected for the use case at hand. A zero value for atol will result in False if either a or b is zero.

97gamjak commented 1 month ago

So what should we do now?

I had a quick look at the math.isclose function and in my opinion, this would be the best way to go. The non-symmetry of the numpy.isclose definiton seems like sort of a "bug" of numpy. There exists an issue within the numpy repository issue 10161, where the basic tenor of the discussions seems a bit similar to ours. My main conclusion when reading/analyszing the discussion within this OPEN issue is that they should have used the "more correct" implementation of math.isclose but now it is to late to change the current implementation as everything would break and they wouldn't be backward compatible anymore.

So my suggestion would be to implement the same isclose member functions as we did now, but using math.isclose along with their default values (rel_tol=1e-09, abs_tol=0.0), which would also ensure symmetry for the __eq__ methods. After implementing these new routines, we should definitly write a new benchmark group in order to compare both implementations. If the numpy.isclose function is considerably faster we would have to discuss further on how critical the performance of these member function is and might be in future use cases.

What do you think @galjos ?

galjos commented 1 month ago

@97gamjak we could suggest to numpy to fix the isclose

galjos commented 1 month ago

In my opinion this is a bug on their side

97gamjak commented 1 month ago

@97gamjak we could suggest to numpy to fix the isclose

yes but if you read their issue, i think they have no intentions in closing or resolving this 😔

galjos commented 1 month ago

Obviously, this change could break alot of projects based on numpy. Although it would be correct to have the math implementation 🥲

galjos commented 1 month ago

I tried to include math.isclose, but imo the code is unreadable now:

        result = []
        for self_pos, other_pos in zip(self.pos, other.pos):
            for self_pos_i, other_pos_i in zip(self_pos, other_pos):
                result.append(
                    math.isclose(
                        self_pos_i, other_pos_i, rel_tol=rtol, abs_tol=atol
                    )
                )

        if not np.all(result):
            return False
97gamjak commented 1 month ago

I tried to include math.isclose, but imo the code is unreadable now:

        result = []
        for self_pos, other_pos in zip(self.pos, other.pos):
            for self_pos_i, other_pos_i in zip(self_pos, other_pos):
                result.append(
                    math.isclose(
                        self_pos_i, other_pos_i, rel_tol=rtol, abs_tol=atol
                    )
                )

        if not np.all(result):
            return False

How about doing something like this:

isclose_vectorized = np.vectorize(math.isclose)
np.all(isclose_vectorized(self.pos, other.pos, rel_tol=rtol, abs_tol=atol))

Probably best would be to define a function in utils (or somewhere else) like this:

def allclose_vectorized(a, b, rtol=..., atol=...):
    isclose_vectorized = np.vectorize(math.isclose)
    result = isclose_vectorized(a, b, rel_tol=rtol, abs_tol=atol)
    return np.all(result)

Then, we could always use it like this:

allclose_vectorized(self.pos, other.pos, rtol=rtol, atol=atol)
97gamjak commented 1 month ago

I tried to include math.isclose, but imo the code is unreadable now:

        result = []
        for self_pos, other_pos in zip(self.pos, other.pos):
            for self_pos_i, other_pos_i in zip(self_pos, other_pos):
                result.append(
                    math.isclose(
                        self_pos_i, other_pos_i, rel_tol=rtol, abs_tol=atol
                    )
                )

        if not np.all(result):
            return False

How about doing something like this:

isclose_vectorized = np.vectorize(math.isclose)
np.all(isclose_vectorized(self.pos, other.pos, rel_tol=rtol, abs_tol=atol))

Probably best would be to define a function in utils (or somewhere else) like this:

def allclose_vectorized(a, b, rtol=..., atol=...):
    isclose_vectorized = np.vectorize(math.isclose)
    result = isclose_vectorized(a, b, rel_tol=rtol, abs_tol=atol)
    return np.all(result)

Then, we could always use it like this:

allclose_vectorized(self.pos, other.pos, rtol=rtol, atol=atol)

I just added the described function in utils.math with the last commit.

97gamjak commented 1 month ago

I just had a quick look at the codecov CI - Probably we never checked the detailed run results. The tests checking if the baertype approach works by setting the the environment to development in the bash script gives a total of 8 failed tests.

This happens probably because the return value of the quick and dirty shell script hack is overwritten by the second run of pytest, which fully succeeds.

97gamjak commented 1 month ago

I just had a quick look at the codecov CI - Probably we never checked the detailed run results. The tests checking if the baertype approach works by setting the the environment to development in the bash script gives a total of 8 failed tests.

This happens probably because the return value of the quick and dirty shell script hack is overwritten by the second run of pytest, which fully succeeds.

This issue has been resolved. The problem was that the isclose member function of Trajectory did not return a bool but a np.boo_. Apparently, beartype can't handle this, so I introduced a new type in PQAnalysis.types called Bool, which handles both. So if there is no need to distinguish between bool and np.bool_, we should always use the PQAnalysis specific type Bool and not bool.

github-actions[bot] commented 1 month ago

PYLINT REPORT

Your code has been rated at 9.53/10 (previous run: 9.85/10, -0.33)

Full report Raw metrics =========== |type |number |% |previous |difference | |----------|-------|------|---------|-----------| |code |7809 |39.96 |7616 |+193.00 | |docstring |8598 |44.00 |8423 |+175.00 | |comment |265 |1.36 |261 |+4.00 | |empty |2870 |14.69 |2800 |+70.00 | Duplication =========== | |now |previous |difference | |-------------------------|------|---------|-----------| |nb duplicated lines |0 |0 |0 | |percent duplicated lines |0.000 |0.000 |= | Messages by category ==================== |type |number |previous |difference | |-----------|-------|---------|-----------| |convention |1 |2 |2 | |refactor |51 |51 |51 | |warning |10 |12 |12 | |error |31 |0 |0 | % errors / warnings by module ============================= |module |error |warning |refactor |convention | |--------------------------------------------------------|------|--------|---------|-----------| |PQAnalysis.io.topology_file.topology_file_reader |67.74 |0.00 |0.00 |0.00 | |PQAnalysis.io.nep.nep_writer |12.90 |0.00 |15.69 |100.00 | |PQAnalysis.io.traj_file.frame_reader |6.45 |0.00 |3.92 |0.00 | |PQAnalysis |3.23 |10.00 |0.00 |0.00 | |PQAnalysis.io.input_file_reader.input_file_parser |3.23 |0.00 |1.96 |0.00 | |PQAnalysis.io.traj_file.api |3.23 |0.00 |0.00 |0.00 | |PQAnalysis.io.gen_file.gen_file_reader |3.23 |0.00 |0.00 |0.00 | |PQAnalysis.topology.__init__ |0.00 |20.00 |0.00 |0.00 | |PQAnalysis.tools.traj_to_com_traj |0.00 |20.00 |0.00 |0.00 | |PQAnalysis.io.moldescriptor_reader |0.00 |20.00 |0.00 |0.00 | |PQAnalysis.atomic_system.atomic_system |0.00 |10.00 |11.76 |0.00 | |PQAnalysis.io.write_api |0.00 |10.00 |0.00 |0.00 | |PQAnalysis.core.api |0.00 |10.00 |0.00 |0.00 | |PQAnalysis.io.traj_file.trajectory_reader |0.00 |0.00 |9.80 |0.00 | |PQAnalysis.tools.add_molecule |0.00 |0.00 |7.84 |0.00 | |PQAnalysis.atomic_system._properties |0.00 |0.00 |5.88 |0.00 | |PQAnalysis.analysis.rdf.rdf |0.00 |0.00 |5.88 |0.00 | |PQAnalysis.topology.topology |0.00 |0.00 |3.92 |0.00 | |PQAnalysis.topology.bonded_topology.dihedral |0.00 |0.00 |3.92 |0.00 | |PQAnalysis.core.residue |0.00 |0.00 |3.92 |0.00 | |PQAnalysis.atomic_system._standard_properties |0.00 |0.00 |3.92 |0.00 | |PQAnalysis.traj.formats |0.00 |0.00 |1.96 |0.00 | |PQAnalysis.topology.selection |0.00 |0.00 |1.96 |0.00 | |PQAnalysis.topology.bonded_topology.bonded_topology |0.00 |0.00 |1.96 |0.00 | |PQAnalysis.topology.bonded_topology.bond |0.00 |0.00 |1.96 |0.00 | |PQAnalysis.topology.bonded_topology.angle |0.00 |0.00 |1.96 |0.00 | |PQAnalysis.io.restart_file.restart_reader |0.00 |0.00 |1.96 |0.00 | |PQAnalysis.io.input_file_reader.pq_analysis._parse |0.00 |0.00 |1.96 |0.00 | |PQAnalysis.io.input_file_reader.pq.pq_input_file_reader |0.00 |0.00 |1.96 |0.00 | |PQAnalysis.io.info_file_reader |0.00 |0.00 |1.96 |0.00 | |PQAnalysis.io.formats |0.00 |0.00 |1.96 |0.00 | |PQAnalysis.core.cell.cell |0.00 |0.00 |1.96 |0.00 | Messages ======== |message id |occurrences | |--------------------------------|------------| |possibly-used-before-assignment |30 | |too-many-arguments |18 | |too-many-instance-attributes |10 | |inconsistent-return-statements |9 | |fixme |9 | |too-many-locals |3 | |too-complex |3 | |duplicate-code |3 | |too-many-branches |2 | |unused-argument |1 | |too-many-statements |1 | |too-many-return-statements |1 | |too-many-public-methods |1 | |too-many-lines |1 | |no-member |1 |