alchemistry / alchemlyb

the simple alchemistry library
https://alchemlyb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
197 stars 51 forks source link

Sync series and dataframe in statistical_inefficieny() #199

Closed ptmerz closed 2 years ago

ptmerz commented 2 years ago

This adds a number of tests for slicing() and statistical_inefficieny():

It then makes sure that both the subsampling series and the dataframe are sliced when calling statistical_inefficieny(). This ensure that the series and the dataframe are in sync before the subsampled indices of the series are applied to the dataframe. This fixes the above tests.

Fixes #198

codecov[bot] commented 2 years ago

Codecov Report

Merging #199 (cc122a8) into master (72622c9) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #199   +/-   ##
=======================================
  Coverage   98.15%   98.15%           
=======================================
  Files          24       24           
  Lines        1352     1353    +1     
  Branches      295      295           
=======================================
+ Hits         1327     1328    +1     
  Misses          5        5           
  Partials       20       20           
Impacted Files Coverage Δ
src/alchemlyb/preprocessing/subsampling.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 72622c9...cc122a8. Read the comment docs.

ptmerz commented 2 years ago

I think the test could be simplified as bit using the @pytest.mark.parametrize

Agreed that this could unify the second and third added test (TestStatisticalInefficiency. test_lower_and_upper_bound_slicer and TestStatisticalInefficiency.test_lower_and_upper_bound_inefficiency). Unless we decide to rewrite TestStatisticalInefficiency. test_lower_and_upper_bound_slicer as an equivalence between statistical_inefficiency and slicing, of course.

So I'd suggest to wait until we have agreed on how to proceed with the remaining points raised, and I will implement this simplification if it's still viable!

ptmerz commented 2 years ago

@xiki-tempula @orbeckst

Thank you for your reviews. I did a first pass, which should address most of your comments. I'm copying the git log here as a summary:

This doesn't yet address the issue with greedy parametrization, I unfortunately did not have time to look into this yet.

ptmerz commented 2 years ago

@orbeckst , I uploaded new commits that should hopefully address your points. Note that currently, the data loading functions had to be duplicated, as they can't be normal functions and fixtures at the same time. If after resolving #206, you only use them as fixtures, you could remove this duplication.