Open frannerin opened 3 months ago
Hello @frannerin! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
openfecli/commands/gather.py
:Line 182:45: E261 at least two spaces before inline comment Line 185:48: E261 at least two spaces before inline comment Line 195:50: E261 at least two spaces before inline comment Line 196:62: W291 trailing whitespace Line 222:48: E261 at least two spaces before inline comment Line 268:48: E261 at least two spaces before inline comment Line 340:5: E266 too many leading '#' for block comment Line 351:1: W293 blank line contains whitespace Line 363:9: E266 too many leading '#' for block comment Line 366:1: W293 blank line contains whitespace Line 380:80: E501 line too long (86 > 79 characters) Line 381:1: W293 blank line contains whitespace Line 391:46: E261 at least two spaces before inline comment
@frannerin - I think this requires more discussion. I don't believe this is the right way to do this, as you end up with a case where the return can mean two vastly different things statistically.
The programmatic approach to getting the mbar errors is via get_individual_estimates
, although I recognise that it isn't the most intuitive thing nor does it play very well with the CLI. What is your use case?
I'm running the rbfe tutorial and wanted to do everything, up to using the "gather" cli command and plotting with cinnabar, but making only 1 protocol repeat and getting !=0 errors in the plots. After, I will try my own system and ligand set but that's a different beast
Thanks, I think the solution here might be to fix the CLI gather command to fetch mbar errors instead if there is insufficient data to do a standard deviation (rather than changing the behaviour of the results object itself).
So for gather the column should still be named "uncertainty (kcal/mol)" even if MBAR error est. is used (and e.g. print a warning)? I'm thinking in case there are transformations with different number of protocol repeats that are gathered together, for example if gather is run before all calculations finished, or some other weird extreme cases (or put a check/assert so that all transformations must have the same number of protocolunits)
So for gather the column should still be named "uncertainty (kcal/mol)" even if MBAR error est. is used (and e.g. print a warning)? I'm thinking in case there are transformations with different number of protocol repeats that are gathered together, for example if gather is run before all calculations finished, or some other weird extreme cases (or put a check/assert so that all transformations must have the same number of protocolunits)
That's a good question - and where it gets really messy.
This definitely would require a longer chat (would you be interested in joining a call about it at some point?).
A take (and not guaranteed to be the right one - definitely needs more thinking through!):
sqrt(err1^2 + err2^2.. + errN^2)/ N
)?Note: we're in a 1.x release, so renaming "uncertainty" might be a tad bit problematic - that might need some consideration (nothing stops us from starting a 2.0 branch though!).
@frannerin may I please suggest that we have that chat before you spend a lot of time working on this? I would hate to see you spend a lot of efforts on a solution just for us to turn around and claim we want something else.
Hi @IAlibay yes I'm open for a chat, but I'm no expert I have to warn you, and I don't think I could contribute much. The last pushed commits where honestly for me and forgot they would show up here hehe 🐋 What you propose in your previous comment seems reasonable to me
Hi @hannahbaumann, first of all sorry for the delay in replying.
@IAlibay raised some very reasonable concerns about this PR in https://github.com/OpenFreeEnergy/openfe/pull/883#issuecomment-2209570108. I would be happy to give a go to the implementation of whatever final decision is taken in terms of column names, error propagation, etc, but I don't think there is one yet? I have to say, I'm already using this code for my own system and it's going ok.
On the other hand, my other PR https://github.com/OpenFreeEnergy/openfe/pull/884 (imo) does solve a bug that does not have as many conceptual implications as this one, and actually the fix that's in that PR is present already in this one, could someone take a look at that one?
I would also be happy to merge the updated main into any or both of these branches to see if everything is still working ok (and cleaning comments etc)
When only a single repeat of the RBFE ProtocolUnit is used in a project (which is the approach in e.g., FEP+), the MBAR error estimate should be used as the uncertainty of the free energy estimate. If not, the uncertainty is 0 (applying np.std() to a list of one number returns 0).
Checklist
news
entryDevelopers certificate of origin