Amber-MD / pytraj

Python interface of cpptraj
https://amber-md.github.io/pytraj
162 stars 38 forks source link

wrong major groove calculation #1626

Open cloudella48 opened 1 year ago

cloudella48 commented 1 year ago

Hi there! I have a problem calculating the major/minor groove distances of my DNA. My DNA is 70 bp long, but I do get as amount of major grooves around 64, which is way too much compared to my DNA strand. Do I understand something wrong or what exactly does the nastruct.major calculate? How are the distances calculated? Can someone help me with that, because I have no clue how to calculate the groove distances otherwise? Thanks a lot!

drroe commented 1 year ago

. My DNA is 70 bp long, but I do get as amount of major grooves around 64, which is way too much compared to my DNA strand.

The major/minor groove width calculations are with respect to each base pair, so you will get a value for every base pair (if using the simple phosphate distance calculation) or every non-terminal base pair (if using the El Hassan/Calladine calculation), so out of 70 potential base pairs, getting 64 major groove values seems reasonable (there's likely some end-terminal fraying). The cpptraj manual entry for nastruct has more details (and references) on how the calculations are performed. Does this help clear things up?

cloudella48 commented 1 year ago

yeah! thanks a lot !

cloudella48 commented 1 year ago

Hi there :) as you`ve helped me so nicely last time.. I have another question: I am calculating major grooves for my DNA but I have notices that the program calculates the distances in a wrong way. I have one basepair missing and one kind of doubled, but not really: (433 is missing) '437A502T',) ('438C500T',), ('438C501G',), ('439A500T',), ('440G499C',

I have notived this, because in the arrays of these basepairs, the value is often equal to zero, but not always. image There is another line, that seems wired to me, but I have not figured out why this happens so far, it is a random line It would be great to know how this wrong calculation is produced and how one can fix it.. Can someone help me?

drroe commented 1 year ago

The CPPTRAJ (the code that pytraj uses on the back-end) manual describes how the nastruct command determines base pairing:

Base pairs are determined either once from the rst frame or from a reference
structure, or can be determined each frame if allframes is specied. Base
pairing is determined rst by base reference axis origin distance, then by stagger,
then by angle between base Z axes, then nally by hydrogen bonding (at least
one hydrogen bond must be present). Base pair parameters will only be written
for determined base pairs.

So if you're using the default base pairing mode (which uses the first frame to determine base pairs) and a potential base pair does not meet the base pairing criteria, the parameters for that will not be calculated. I'm not sure if pytraj has the equivalent of the cpptraj allframes option yet to determine base pairing at every frame (@hainm would know better).

hainm commented 1 year ago

I'm not sure if pytraj has the equivalent of the cpptraj allframes option yet to determine base pairing at every frame (@hainm would know better).

pytraj does not support that yet. But user can trick pytraj by specifying it in pucker_method argument

pytraj.nastruc(..., pucker_method='altona allframes')

@cloudella48 I have not tested the code yet so please give it a try.

cloudella48 commented 1 year ago

When using the pucker_method flag I get more basepairs than I actually have in my DNA (verses the end there are completely random combinations and the two basepairs are still missing (they are always the same). Regarding the calculations I have another question because the values for major groove calculation are always around 19-20 (in the example in the documentation, too) but I know that major groove width in DNA is normally around 12-13 Angstrom or the length of a groove around 22 Angstrom. I know it is the distance in between the P atoms which is calculated but anyhow, then it is not exactly the width, is it? And do you know in what units it is calculated? Something, which might be good for you to know: I do not see the random values = 0.00 anymore, when using the pucker_method flag (only the random basepairs)

image

cloudella48 commented 1 year ago

Hi there... another issue I am facing now.. when I use the allframes version and delete the columns which are randomly attached in the end I will get different output values for both versions. SO the calculation is random and does not match one version with the other. I guess this is not normal ?

hainm commented 1 year ago

Dear @cloudella48,

to make sure, please use cpptraj directly to see if the results from pytraj is reasonable or not: https://amberhub.chpc.utah.edu/analisis-of-nucleic-acid-simulation/

(please search for nastruct)