ExcitedStates / qfit-3.0

qFit: Automated and unbiased multi-conformer models from X-ray and EM maps.
MIT License
34 stars 11 forks source link

BIC looping values #329

Closed stephaniewankowicz closed 1 year ago

stephaniewankowicz commented 1 year ago

Saulo's initial commit of this line (https://github.com/ExcitedStates/qfit-3.0/commit/32e70b53f3e06ee12a703308fc948678f6f2c162) specifically described this as "Loopy BIC implementation for qfit segment". qfit-3.0/qfit/qfit.py

Line 941 in 32e70b5

loop_range=[0.3, 0.25, 0.2, 0.16, 0.14, 0.12]) As you can see, first number was tweaked, and then the last number was dropped in https://github.com/ExcitedStates/qfit-3.0/commit/ec95fa98c31a5d1cc7678b6b4430e66ff726ac64 (presumably memory considerations?), but it hasn't changed since then. (Note also that this commit has 'fixed inconsistencies' in its title!)

a) Does anyone (probs best to ask @sauloho himself!) know why these numbers were chosen different in the loop implementation ? b) Is there a reason that we need consistency here?

Presently, I'm in favour of leaving it as it is, until either (i) we know why it is, or (ii) we can show that this change improves things.

Additionally, it does not make sense to have 0.4 or some other numbers in the looping. It should be thresholds that would return 1, 2, 3, 4, or 5 conformers (ie 1.0, 0.5, 0.33, 0.25, 0.2).

blake-riley commented 1 year ago

The threshold constraint in MIQP ($t{d{min}}$) is a lower-bound threshold on the occupancy of all conformers, I just want to add my 2ȼ here, and highlight the term "at most" from the paper. $n{conf} \leq 1 / {t{d_{min}}}$. i.e. The thresholds 1.0, 0.5, 0.33, 0.25, 0.2 would return at most 1, 2, 3, 4, or 5 conformers.

In reality, when $t{d{min}}$ = 0.5 (for instance), we tend to get solutions containing 1 conformer with 0.5<ω≤1.0 .

Contention In this light, I think the $t{d{min}}$ = 0.4 (and other, non-1/integer values) are quite reasonable. (For reference, this would return at most 2.5, so still 2 conformers).

Example (albeit contrived) Imagine a Ser which can occupy two rotamers pointing in opposite directions: one of the directions has the O quite well localized, the other has the O slightly more 'ellipsoidal'. A threshold of 0.5 might give a suboptimal solution, nailing only one of the rotamers with ω={0.6}. A threshold of 0.4 would enable a solution of ω={0.6, 0.4} for two conformers. The threshold of 0.33, could do the {0.6, 0.4} solution, but it might settle on a 'better' solution (improved overall fit to density) of {0.33, 0.33, 0.34}, having split the the 0.6 conformer into two pretty close conformers to cover the 'ellipsoidal' density.

For the context of BIC --- we'd like to have access to the {0.6, 0.4} solution, because it's the parsimonious one. (And phenix can refine an aniso model to that 0.6 rotamer). This might only happen if we scan thresholds that aren't clearly 1/integer.

Perhaps a deeper problem? For this BIC-wrapped MIQP, we're trying to co-optimize fit to density and BIC (for parsimony). Is there a better way of doing this than we currently are (a naive optimization of BIC over loop_range)? If not, what's the best values for loop_range?

stephaniewankowicz commented 1 year ago

Closing this issue as we have changed an implemented BIC changes in #325