Test & improve metrics for removing low density conformations

stephaniewankowicz commented 1 year ago

Currently, in qFit, removing conformers below a certain density level is defined as: if any voxel in the conformed has a density intensity <0.3 e−1 Å−3, the conformed will be removed.

This is default turned off.

We should include something like this, but it should look at if somewhere around >=5 atoms lack support (or somewhere around there).

This should be tested with different atomic cutoff values

stephaniewankowicz commented 7 months ago

I test removing low density conformers and putting a qp test in angle. Both removed almost all of the 'too many conformers, reverting'. However, the qp in angle kinda blew up the R-free while the remove conformer even if one atom is below a certain threshold tended to increase the number of residues where we could not find a solution. I test this with removing if 1, 2, or 3 atoms are below the cutoff value. All of these removed the angle issue but increased the number of residues that we could not find a good conformer. I am going to try with 4 or 5 atoms as this should still eliminate many aromatic conformations.

blake-riley commented 7 months ago

OK, perhaps this comment is a little broader than this specific issue --- lmk if I should create a new issue to track it?

Problem: overfitting / trying to fit against an "oversampled" model I've seen a bunch of "too many conformers" in which there are over 2000 conformers (and sometimes over 10000!) In these circumstances, we know in advance that we will be trying to find a best fit for 2000 conformers to ~1500 voxels (or so).

To my ears: that's at best an overfit QP solution (more parameters than datapoints), at worst an unsolvable QP. As you highlighted in #378, and as you're trying to address here in this issue (#346) the change to the angle sampling will make this yet more common.

Suggestions

An interim measure: qFit should emit a logger.warning from qfit._BaseQFit._solve_qp() if it notices that it has been asked to solve an "over-fit" situation (more conformers, so more ω than voxel datapoints --- i.e. self._models.shape[0] > self._target.shape[0]).
Ultimately, I think that if qFit notices it's attempting to solve an "over-fit" situation (more conformers than voxel datapoints), it would be a good idea to reduce the sampling, and do a more coarse sampling than what the user requested. Already, we try to backtrack, and I see messages like: Too many conformers generated (29720). Reverting to a previous iteration of degrees of freedom: item 0. n_coords: [29720] That's ... not really reverting, tbh. Is this another bug? This feels like the sampling code might need a pretty deep rewrite, but I think it would be good to have this in place if you're gonna get backbone sampling working? (I'm excited for this!)

ExcitedStates / qfit-3.0

Test & improve metrics for removing low density conformations #346