GarmanGroup / RABDAM

Identification of specific radiation damage in MX structures using the BDamage and Bnet metrics
GNU Lesser General Public License v3.0
5 stars 2 forks source link

PDBCUR #10

Closed kls93 closed 8 years ago

kls93 commented 8 years ago

@JonnyCBB Hi Jonny, I think I've found a bug - the unit cell atom list is generated from the pdb file after being put through PDBCUR (so that alternative conformations etc. are removed), but the asymmetric unit cell list is generated from the original pdb file (where alternative conformations etc. are still present). I just wanted to check whether you and Tom had a reason for doing this, before I change the code so both lists are generated from the PDBCUR output. Thanks :)

td93 commented 8 years ago

Hi Kathryn, I think we did this deliberately, since atoms in different conformations have different b factors, and their packing densities will be different etc. It might have yielded some interesting results on say, one conformer being damaged more than another, which could be really interesting (but obviously relies on having really well refined structures...)! I can't say it was a particularly thought out process, but if you think would make more sense to take it out, I would say feel free to do so. Tom

JonnyCBB commented 8 years ago

I remember @td93 proposing this (regarding the different conformations) and it wasn't something I had considered but I there was no reason why he shouldn't/couldn't investigate whether it would be more insightful.
If there is no information gained from calculating the Bdamage values of atoms in different conformations then I think we should change it back i.e. only calculate for the single conformer! I think this should be the case because it would be in line with what everyone is is told that the algorithm does i.e. it would be more consistent with what has been communicated.

td93 commented 8 years ago

@kls93 I wonder if would be better perhaps to calculate the values independently but average them in the output for that atom type (e.g. OE1, and OE2 are still two different values, but OE1A and OE1B values would be averaged into one OE1). Perhaps a step could also be added here to chuck out an additional output if the two values are 'significantly' different (but how would significance be evaluated...) Just some thoughts for the mixing pot!

JonnyCBB commented 8 years ago

Yes the significance would be an issue and I'm pretty sure that there wouldn't be enough statistical power within a single structure to actually get a confident estimate (that's even before we've chosen the statistical test).

But I think this complicates the whole method as it stands. When packing density calculations are performed, do you now take into account the weighted average of all atom conformations or something similar? Perhaps the implementation may (or may not be) too complicated to do but ultimately does it give significantly more valuable information for the time it will take to do? Perhaps the answer is yes but I don't think Kathryn has the time left to do it considering the other things that she needs to do.

@td93 did/ @kls93 do you see anything interesting regarding this when you did/do the analysis on B damage? If the answer is no, then it's probably not worth the time implementing, whereas if the answer is yes, then it's up to @kls93 as to whether she wants to do it.

(Sorry for the waffle)

td93 commented 8 years ago

Good points Jonny. I never actually looked into conformational differences at any point. My suspicion is that there won't be a difference usually since B factors are often pretty similar between conformations, and PD differences won't be much. Differences are likely to arise just from an atom in one conformation simply being in a different bin compared to the other! Obviously it relies on B factors being independently refined for both conformations (I have no idea if this is usually the case) but more importantly also on the occupancy being refined for each rather than the usual 50/50 split, since incorrect occupancy can artificially raise/lower B factors due to the difference in average electron density between, say an atom at 50% occupancy relative to an atom at 30% occupancy... I don't think that most structures are refined so independently (due to there being more degrees of freedom) so perhaps for the typical user information derived from this would be entirely artefactual!

JonnyCBB commented 8 years ago

Agreed. Also, if I remember rightly, Markus explicitly mentioned something about the fact that people typically don't refine occupancy and that Bdamage was meant to be used in this case. In which case modelling the conformations is something that was intentionally ignored....... but then again I'm not sure if I'm just making this up.

kls93 commented 8 years ago

Hi both,

Thanks for all the replies!

I'm glad this wasn't a bug - the reasons I thought it was were firstly that as Jonny said in Markus' thesis and paper the programme is described as dealing only with single conformations, and secondly the asymmetric unit cell values are taken directly from the original PDB file (as opposed to that file first being put through PDBCUR to remove hydrogens, 0 occupancy atoms etc. - even if we make no other changes I think that this should be the case).

My initial problem with considering alternative conformations in the asymmetric unit is that we only consider single conformations in the unit cell, so the packing density of a lower occupancy conformation is calculated by comparing it to the higher occupancy conformation in the unit cell rather than to itself - Tom is right though that the distances between alternative conformations are sufficiently small that when you use a packing density threshold of 14 Angstroms this shouldn't have any effect.

However, to me it feels inconsistent that we consider alternative conformations in the asymmetric unit but not in the unit cell - I feel it implies to the user that the programme takes alternative conformations into account in a way that it currently does not. Also, if equivalent atoms in different conformations are placed into different bins this can result in differences in Bdamage even if there are no differences in Bfactor (I think it is fair to assume in the case of alternative conformations that they are close enough in space that their packing densities will be fairly similar, and so differences in damage will be reflected in the relative Bfactor values). I know that this is also the case for 'non-equivalent' atoms pushed into different bins, but I feel that the implication to the user of the difference in Bdamage between such non-equivalent atoms is much less than between equivalent atoms in alternative conformations. Also, unless the structure has refined the occupancies of the alternative conformations as opposed to just setting them equal to e.g. 0.5, then the Bfactors of these atoms will contain error. Although we can't correct for these errors when using the Bfactor values to calculate Bdamage, my feeling is that we should not draw undue attention to these potential errors by providing Bdamage values for alternative conformations.

Sometimes though you do see quite large differences in Bdamage values between alternative conformations (with correctly refined occupancies), reflecting quite large differences in Bfactor, so in some respects it would be nice to continue to consider alternative conformations as we will lose some potentially useful (plus some artefactual) information by taking this feature out.

One idea I had was that if we are to consider alternative conformations in the asymmetric unit, I could weight the value that an atom in the unit cell contributes to the packing density conformation by its occupancy. This to me seems fairer than the current calculation (therefore if two atoms are pushed into different packing density bins, it will not be owing to only considering a single conformation of a multi-conformation residue). However, the problem with this is that you are assuming that occupancy values of alternative conformations have been refined (and refined correctly), which is not necessarily the case (I have found some cases of refining the occupancy of non-ligand single conformation atoms in the radiation damage series structures I have been looking at, and you would think that these structures would be correct!). Therefore I think it best that for now the occupancy value is left well alone.

So, in summary, I think that we should keep the packing density calculation as it is, and only consider single conformations in the asymmetric unit. However, I can see the argument for keeping the alternative conformations in the aymmetric unit, so let me know if you disagree with me :)

Sorry for the length of my reply! Please let me know if you disagree with my assumptions / ideas. Kathryn

JonnyCBB commented 8 years ago

Thank you. That's very comprehensive. I agree, we should only consider single conformations. I think the assumption that occupancy is correctly modelled will be false in the majority of cases. ThereforeI think that the Bdamage calculation taking into account alternative conformations will be misleading for the majority of cases in practice.

Therefore let's go with the single conformation calculation for now. I'll close the issue and maybe it will be picked up by someone later. That will be dependent on how informative and widely used Bdamage will be ;)