choderalab / ensembler-manuscripts

Manuscript for Ensembler v1
0 stars 3 forks source link

Generate figure to show distribution of MolProbity scores #47

Open jchodera opened 8 years ago

jchodera commented 8 years ago

This was requested by both reviewers.

danielparton commented 8 years ago

@jchodera Here are the distributions of MolProbity scores. Lower scores are supposed to represent better quality models. The first plot shows the distribution built from all scores. The second shows the distributions stratified by sequence identity.

As you can see, the distributions are roughly bimodal. I'm not sure why this is, but maybe there is some single input feature with a very strong effect on the MolProbity score which is separating the two peaks.

The scores do increase with decreasing sequence identity, which is the expected behavior, so this does suggest some utility of MolProbity for filtering models. However, those shifts are much less distinct than the separation between the two main probability densities.

So to be certain of the utility of MolProbity, I think it would be necessary to look into the results in more detail. However, I imagine this would involve a non-trivial amount of time and effort.

So for the paper, perhaps we should add these plots as they are, make the above points in the text, and suggest that a full validation of MolProbity is outside the scope of this paper, and that we plan to explore model validation in general in future work? What do you think?

jchodera commented 8 years ago

Thanks so much for putting this together!

I think you're right about the fact that it may take significant time to understand the bimodal nature of the MolProbity score. I could come up with a few hypotheses---for example, the poorly-scoring models may be those where significant loop modeling is required---but it might still take a few guesses to really explain what is going on here.

Without an understanding of the bimodal nature, however, I'm not sure if these plots add much to the paper. It may be that we're better off not including them---or perhaps only including them as CDFs rather than PDFs.

@sonyahanson and @pgrinaway: What do you think?

jchodera commented 8 years ago

Anyone else have input on this?

sonyahanson commented 8 years ago

I agree, I don't necessarily think these plots need to be included in the paper. I think the most important thing is that it's another way of understanding the integrity of the homology models, which I feel we are arguing is not that important at the initial model stage, since the point is we are making so many models we don't need all of them to survive. Maybe we can show improvement of molprobity score between implicit_refine and explicit_refine models? Unless I am misunderstanding something.

Also I guess now there is this molprobity-refine_implicit_md.yaml file generated in the model folders? Is this the only output from MolProbity? I suppose this could be useful for tracking why a model is behaving strangely, if it is of specific interest to the user. In this light it could be especially interesting to also include the 'clashlist' that molprobity can generate.

My impression is that it is mostly important to say to the reviewers 'look we added it' and write something about why it's useful, and what it tells us about our models. Probably a new figure is not necessary for this, but maybe it could go in the supplementary information?

jchodera commented 8 years ago

My impression is that it is mostly important to say to the reviewers 'look we added it' and write something about why it's useful, and what it tells us about our models. Probably a new figure is not necessary for this, but maybe it could go in the supplementary information?

Agreed!