Amber-MD / cpptraj

Biomolecular simulation trajectory/data analysis.
Other
138 stars 64 forks source link

NAStruct feature request: output axes info for bases and basepairs #1036

Open tcbishop opened 1 year ago

tcbishop commented 1 year ago

Maybe cpptraj can already do this... ?

I would like to have NAStruct local axis information for both bases and base pairs made available alongside the other pairing and step parameter information using cpptraj keywords.

The goal is to write out vector data in a pseudo-trajectory format that can be overlaid on the all atom trajectory for easy visualization.

In debug mode NAStruct can output axis information for bases (baseaxes.pdb line 316 ) and basepairs (basepairaxes.pdb line 1087) so this appears to be mostly an I/O issue.

I'm proposing something like the following (using text adopted from El Hassan JMB 1995 Figure 1 for "base pair reference frame" and "base reference frame" and the existing descriptors in section "35.11.54. nastruct" of Amber 22 manual)

Base pair Reference Frames: 1 frame for each base pair

[BpOrg]: x,y,z coordinates of the base pair axis origin . [BpX-axis]: unit vector for base pair x-axis relative to BpOrg [BpY-axis]: unit vector for base pair y-axis relative to BpOrg [BpZ-axis]: unit vector for base pair y-axis relative to BPorg Base Reference Frames: 1 frame for each base [BOrg]: x,y,z coordinates of the base axis origin [BX-axis]: unit vector for base x-axis relative to BOrg [BY-axis]: unit vector for base y-axis relative to BOrg [BZ-axis]: unit vector for base y-axis relative to BOrg For comparisons In 3DNA Base Pair axes information is reported in the "ref_frames.dat" file as follows ... 1 T-A ... 109.6665 103.8816 167.4595 # origin 0.7276 -0.6703 -0.1460 # x-axis 0.6559 0.6175 0.4341 # y-axis -0.2008 -0.4117 0.8889 # z-axis ... 2 A-T ... 108.6866 101.8792 169.7685 # origin 0.9787 -0.2047 -0.0159 # x-axis 0.1891 0.8686 0.4581 # y-axis -0.0799 -0.4513 0.8888 # z-axis
drroe commented 1 year ago

The goal is to write out vector data in a pseudo-trajectory format that can be overlaid on the all atom trajectory for easy visualization.

This is a good idea, and as you point out the functionality is mostly there, just hidden behind #ifdefs. I envision it as having a similar format to the writedata vectraj command, which lets you write vector DataSets in a pseudo trajectory format (see help Formats writedata vectraj). So maybe keywords like:

nastruct NA axesout MyAxes.pdb axesfmt pdb \
  bpaxesout MyBpAxes.nc bpaxesfmt netcdf bpaxesparmout MyBpAxes.parm7 ...

How does that seem? I'll start working on this and try to get this implemented as soon as I can. Thanks for the suggestion.

tcbishop commented 1 year ago

Thanks! I think you get the idea. see a private email w/ the "bigger picture" that is beyond the scope of this request.

I"m no expert w/r/to cpptray/pytraj... so I trust you know best way to implement and what you propose seems to do the trick. I presume the vectors when loaded in say VMD will be s.t. they align w/ the all atom structure. If the parm7 file has bonds between the director frame "Origin Atom" and the individual X, Y, Z axes then that makes for proper connectivity.

FYI: when I generate PDBs with basepair director frames I use "CA" for the base pair "Center Atom" and "H1", "H2", "H3" for the "helical axis" atoms... This way VMD automatically connects the axis (unit vectors 1A from the CA) and does not connect the CA atoms b/c they are 3.4A apart. Of course other names work (e.g. CA, CC, CG, CT would preserve sequence info ) but a heavy atom and hydrogens seem to do the trick for making quick visuals. IDK what will happen for default viz if apply same idea to the bases rather than basepairs... CA1, CC1, CG1 CT1 for watson strand and CT2, CG2, CC2, CA2 for crick strands?

drroe commented 1 year ago

Should be addressed once #1037 is merged in.

I presume the vectors when loaded in say VMD will be s.t. they align w/ the all atom structure.

Yes, they will align on however the coordinates are when they get to nastruct (so e.g. if you've RMS fit the coordinates beforehand, the axes will align on the RMS-fit coordinates).

If the parm7 file has bonds between the director frame "Origin Atom" and the individual X, Y, Z axes then that makes for proper connectivity.

Yes - I usually like the mol2 format for this (since everything is in one file), but for large trajectories it's best to write the axes coordinates to a NetCDF or DCD trajectory and write the separate topology.

FYI: when I generate PDBs with basepair director frames I use "CA" for the base pair "Center Atom" and "H1", "H2", "H3" for the "helical axis" atoms... This way VMD automatically connects the axis (unit vectors 1A from the CA) and does not connect the CA atoms b/c they are 3.4A apart.

Very clever! I've kept the names used in the debug output by default (Orig, X, Y, Z) but have added some keywords that let you change this (e.g. axisnameo CA axisnamex H1 axisnamey H2 axisnamez H3).

drroe commented 1 year ago

This has been merged and the new functionality is in version 6.19.6.

drroe commented 1 year ago

@tcbishop I just wanted to follow up and see if this functionality is working for you.

tcbishop commented 1 year ago

Finally getting back to this.

For pytraj: how does one enter the list of base pairs?

For cpptraj: There are two odd things to report and a usage issue when seeking to analyze a system that happens to have 358bp and using a list of specified pairs.

In terms of usage the "specified pairs" line is very long. Is there a shorthand that can be used and possibly coupled with the now deprecated byptype ( para & anti ) key word?

1) w/r/to the following axes options... [axesout [axesoutarg ...] [axesparmout ]] [bpaxesout [bpaxesoutarg ...] [bpaxesparmout ]] [stepaxesout [stepaxesoutarg ...] [stepaxesparmout ]] [axisnameo ] [axisnamex ] [axisnamey ] [axisnamez ] All seems to work as advertised FOR ONE FRAME. The ORIGin for the base "axes" seems to be much closer to the ORIGin for the "bpaxes" than I expected. SEE ATTACHED image... shouldn't the red and blue licorice be farther separated in the image on right? This may just be my ignorance.

2) It seems that the base pair step axes pairing calc is not fully compliant with the "specified pairs" option. The steps are not determined from the list of basepairs provided but from some other method. Is this a Feature or bug?

thus when a trajectory rather than just single frame is analyzed it seems the number of STEPS changes. Specifically the command line

stepaxesout stepAxes.pdb stepaxesoutarg pdb stepaxesparmout stepAxes.parm7

causes the steps to be evaluated or re-evaluated s.t. the number of base pair steps changes and cpptraj dies.

cyt 207% cpptraj sys.parm7 < ../bin/dcd2rod.cpptraj > some.log2 Error: Number of base pair steps has changed from 354 to 358. Error: Base pair step axes pseudo-topology is already set up for 354 bases.

w/out the stepaxesout calc line the analysis all seems to work but even for one this on frame the number of steps is not compatible w/ the number of base pairs.

Careful inspection of the image (did it actually get attached?) reveals atom counts of StepAxes = 1416 -> 1416/4 = 354 bp BPAxes = 1432 -> 1432/4 = 358 bp Axes = 2864 -> 2864/4 = 716 bases -> 358bp

The error when you run cpptraj w/ a trajectory is

cyt 190% cpptraj sys.parm7 < ../bin/dcd2rod.cpptraj > some.log2 Error: Number of base pair steps has changed from 354 to 358. Error: Base pair step axes pseudo-topology is already set up for 354 bases.

hainm commented 1 year ago

For pytraj: how does one enter the list of base pairs?

I don't think pytraj supports the new feature yet. But one can hack by adding more cpptraj option to pucker_method option here: https://github.com/Amber-MD/pytraj/blob/f3cbb051f71e15d3f2b4b0d3422d4a4bfcab68e0/pytraj/analysis/nucleic_acid_analysis.py#L31

pucker_method='altona [and some cpptraj options here]',. But I am not so sure about return type (if getting any error or not).

drroe commented 1 year ago

First, sorry that you're still encountering issues! I'll try to resolve them as soon as I can.

In terms of usage the "specified pairs" line is very long. Is there a shorthand that can be used and possibly coupled with the now deprecated byptype ( para & anti ) key word?

Maybe something where you can specify strands via ranges. So instead of:

pairs 1-16,2-15,3-14,4-13

You could do something like:

pairs [1-4][16-13]

Do you think that would be useful? If you have other suggestions please let me know.

The ORIGin for the base "axes" seems to be much closer to the ORIGin for the "bpaxes" than I expected. SEE ATTACHED image.

There was no attached image, but what you describe is not unexpected and is just a consequence of how the reference axes origins are placed on each individual base (see Olson et al. J. Mol. Biol. (2001) 313, 229-237).

It seems that the base pair step axes pairing calc is not fully compliant with the "specified pairs" option. The steps are not determined from the list of basepairs provided but from some other method. Is this a Feature or bug?

The steps should be determined by connectivity within strands. So if strand 1 is A-B and strand 2 is C-D and A is bonded to C and B is bonded to D or A is bonded to D and B is bonded to C then that will count as a step (AC/BD or AD BC respectively). I would need more details of your system (actual topology/coordinates and a description of which step is expected but missing) in order to debug your specific case.

causes the steps to be evaluated or re-evaluated s.t. the number of base pair steps changes and cpptraj dies.

Unfortunately, I'm not surprised you're hitting issues - the code is still fairly new and relatively untested. Best route here is for you to provide me with a small (10-20 frames tops) test trajectory (and corresponding topology) so I can reproduce the issue and try to address it. You can extract the part of the trajectory where the calculation dies using trajin <start> <stop>. Let me know if you need more details or if any of this seems unclear.