alchemistry / alchemlyb

the simple alchemistry library
https://alchemlyb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
189 stars 49 forks source link

The output of the function `statistical_inefficiency` doesn't include statistical inefficiency value #295

Closed VOD555 closed 1 year ago

VOD555 commented 1 year ago

The current statistical_inefficiency function only returns the subsampled data, but I also want the value of statistical inefficiency. And I have to use the statistical_inefficiency in 'pymbar', and do a similar calculation again.

It'd help if statistical_inefficiency returns the subsampled data as well as the value of statistical inefficiency or has an option to control whether to return the value.

xiki-tempula commented 1 year ago

I think this is a valid request but I'm not sure of what is the best way to supply it without breaking the existing interface. If you just want an integer g, you could already get it by computing the number of decorrelated sample over the total number of samples?

orbeckst commented 1 year ago

I have two ideas

Add an additional function that returns (df, g)

Instead of having alchemlyb.preprocessing.subsampling.statistical_inefficiency() return (df, g) and break the API, we could add a function subsampling.statistical_inefficiency_with_g() (or with a better name) that returns both, or uses the pymbar approach of returning dicts. Then we could effectively make subsampling.statistical_inefficiency = lambda *args, **kwargs: statistical_inefficiency_with_g(*args, **kwargs)[0]. It's not quite as pretty and functional-programming style but useful.

@VOD555 would that help? Remind me how statistical_inefficiency() is used in your application. Is it used directly and could be replaced with a statistical_inefficiency_with_g() or is it called indirectly?

Add an optional "logging" dict to our functions

The other alternative is to give all functions an optional argument other_return_values where you pass in a dict and the function stores anything noteworthy in the dict: e.g.

run_data = {}
df =  statistical_inefficiency(..., other_return_values=run_data)    # adds run_data['g'] = g 
print(f"statistical inefficiency g = {run_data['g']}")

It's a little bit messy and one has to document clearly what is stored with this mechanism. But it can be implemented in a backward-compatible manner and we can pass this other_return_values through all our functions. It would act as a workflow log, i.e., instead of just writing interesting values to the log file, add it to the dict as well so that it's programmatically accessible. If the user does not care (the default) then do nothing.

xiki-tempula commented 1 year ago

@VOD555 Do you mind explaining a bit more about what you actually need?

If you would like to via the g in terms of interactive analysis, I could print it into the log, which is easy. Or you need the g but the integer format of g is enough, this could easily be done by dividing the length of the original dataset by the length of the new dataset, which is also kind of easy. Or if you need the actual float value of the g, do you mind providing more background, so I could see what is the best way forward?

VOD555 commented 1 year ago

@xiki-tempula I'm ok to get the integer from dividing the length, as we just want a record for the g used in the analysis. I was thinking whether it is necessary to do extra (though very simple) calculations to get a value that has been calculated in statistical_inefficiency, and printing it into the log is ok.

orbeckst commented 1 year ago

@VOD555 do you have any follow-up or can we close the issue?

VOD555 commented 1 year ago

@orbeckst See my previous comment, I think it's necessary to at least print g in the log file.

xiki-tempula commented 1 year ago

Sorry, I have been slow on this. I think one problem for these functions is where to create the logger object. My solution is to use loguru as mentioned in https://github.com/alchemistry/alchemlyb/issues/301, one doesn't need to create the logger and the function name be the title of the logger. I will change to loguru first, and then it will be easier to log this information.

orbeckst commented 1 year ago

ok, I have no objections