alchemistry / alchemlyb

the simple alchemistry library
https://alchemlyb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
197 stars 51 forks source link

Error in the load_expanded_ensemble_case_1 #184

Closed xiki-tempula closed 2 years ago

xiki-tempula commented 2 years ago

Note the dataframes in load_expanded_ensemble_case_1 has NaNs in it as dropping the na will create a problem in the tests.

xiki-tempula commented 2 years ago

fail-fast is set to false to allow all tests to run.

orbeckst commented 2 years ago

In principle using dropna sounds sensible but what was the motivation you added it in this PR?

xiki-tempula commented 2 years ago

@orbeckst So I would like to write a more robust gmx parser https://github.com/alchemistry/alchemlyb/pull/183. However, I discovered that the new parser would fail some old tests. The problem seems to be associated with the load_expanded_ensemble_case_1, which when dropping the nan will result in a different answer. So my question is would we expect NaN in the expanded ensemble dataset and are the NaNs artefacts? I will try to contact the author of this dataset and see what is his opinion.

orbeckst commented 2 years ago

@mrshirts we have a question about the expanded ensemble dataset case 1: It contains NaN. Are they supposed to be in there?

mrshirts commented 2 years ago

I don't think there should. Where specifically did it appear? Which file/files?

orbeckst commented 2 years ago

@xiki-tempula , you looked at the NaNs in the test data. Could you provide specific details to @mrshirts 's question, please? Thanks.

xiki-tempula commented 2 years ago

@orbeckst Sorry, I'm a bit occupied on weekdays will have a look tomorrow.

xiki-tempula commented 2 years ago

@mrshirts So there is only one file in gmx_expanded_ensemble_case_1, which is CB7_Guest3_dhdl.xvg the line 1530, it is 2936.0000000000 21.0000000000 -91168.9690000000 nan -78.7414550000 113.0974300000 1.5736808000 64.8738410000 64.8738410000 64.8738410000 64.8738410000 64.8738410000 60.9368710000 56.9996240000 52.2751040000 47.5506900000 42.8262650000 38.1016370000 33.3771240000 28.6527460000 23.9282540000 17.6289880000 11.3295930000 5.0303196000 -1.2689984000 -7.5682849000 -13.8674930000 -9.6008139000 0.0000000000 12.5142880000 26.8151620000 39.1129320000 45.4792640000 58.5289710000 71.9205120000 85.6069280000 99.5163480000 113.6875200000 128.5488700000, one could see the nan

orbeckst commented 2 years ago

@mrshirts , after looking at the data https://github.com/alchemistry/alchemlyb/pull/184#issuecomment-1052074007 do you think we can simply decide to ignore this line in the data file?

mrshirts commented 2 years ago

For now, yes, I think ignoring is best. I haven't been able to figure out why it is there, but we've used this data for other purposes, so I think it is fine for demonstration. We can eventually generate another data set as necessary.

On Sun, Apr 3, 2022 at 4:21 PM Oliver Beckstein @.***> wrote:

@mrshirts https://github.com/mrshirts , after looking at the data #184 (comment) https://github.com/alchemistry/alchemlyb/pull/184#issuecomment-1052074007 do you think we can simply decide to ignore this line in the data file?

— Reply to this email directly, view it on GitHub https://github.com/alchemistry/alchemlyb/pull/184#issuecomment-1086961336, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABATPVBSZ3CMG3ZQWBKWZTLVDIKVRANCNFSM5K74USEQ . You are receiving this because you were mentioned.Message ID: @.***>

xiki-tempula commented 2 years ago

@orbeckst Sorry, I opened this PR to show that there is NaN in the test set. It is not that I wish to merge this PR.