Defining sampling efficiency: normalized by total steps or number of saved samples?

kamronald commented 11 months ago

Email (Optional)

kamronald@berkeley.edu

Version

v0.5.3

Which OS(es) are you using?

[X] MacOS
[ ] Windows
[X] Linux

What happened?

Have noticed recently that my sampling efficiency values obtained from my SampleContainer instances (ones without bug #441) using samples.sampling_efficiency() is quite small. The sampling_efficiency (in smol.moca.sampler.container.SampleContainer) is defined as "total_accepted" divided by ("total MC steps" - "discarded samples"). Would it be more accurate to define sampling efficiency as "total_accepted" divided by ("number of samples" - "discarded samples") instead? The difference is that "total MC steps" includes the samples that were not saved from sample thinning during the MC run, so we do not know the acceptance rate of those discarded samples. What we only know for certain is the number of acceptances from the number of saved samples.

Code snippet

samples.sampling_efficiency()

Log output

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

lbluque commented 11 months ago

Yes, you are right! The current implementation is not correct when you thin the sampling by > 1. What you are proposing makes more sense, however, it is an "estimated" sampling efficiency for thin > 1.

If you get a chance can you please update this in a PR? and perhaps make a note of the above comment on estimation in the docstring.

Thanks a lot @kamronald !

lbluque commented 10 months ago

addressed in #443

CederGroupHub / smol