Closed anastasiiaNG closed 4 years ago
Is the ECBLAST_R01492_AAM.txt the output? That looks correct, it's fixed width not a TSV you can't open it in Excel which it looks like you've done.
Please see bottom of Page 40: "The Atom Block" http://help.accelrysonline.com/ulm/onelab/1.0/content/ulm_pdfs/direct/reference/ctfileformats2016.pdf
Ok, I see, thank you.
Probably you could advise me is there any way to automatically extract these atomblocks as separate R/python objects with atom mapping numbers always in the same column?
I try to read ECBLAST_R01492_AAM.txt automatically by the means of R package ChemmineR and even this specialized library fails to read such files correctly. Below is how ChemmineR interprets this file:
ttt <- ChemmineR::read.SDFindex("ECBLAST_R01492_AAM.txt",
index=data.frame(A=1,
B=length(readr::read_lines("ECBLAST_R01492_AAM.txt"))))
ChemmineR::atomblock(ttt)
Then this looks like it might be a bug in ChemmieR... RDKit/CDK will handle it correctly but you will have to manually by iterating over the atoms.
Because it's a fix width format (dates back to fortran days) you can just grabs the chars you need using simple command line utilities.
$ grep -F '.' ~/Downloads/ECBLAST_R01492_AAM.txt | cut -c1-49,61-63
...
-18.8182 -5.0468 0.0000 C 0 0 0 0 0 61
-17.7257 -4.0189 0.0000 C 0 0 0 0 0 62
-18.0704 -2.5582 0.0000 C 0 0 0 0 0 63
-19.5074 -2.1254 0.0000 C 0 0 0 0 0 64
-16.2893 -4.4509 0.0000 C 0 0 0 0 0 65
-18.4745 -6.5069 0.0000 C 0 0 0 0 0 66
-12.6166 0.2957 0.0000 C 0 0 0 0 0 67
-3.6837 0.0740 0.0000 C 0 0 1 0 0 68
-3.9578 -1.4007 0.0000 C 0 0 0 0 0 69
-2.8177 -2.3755 0.0000 C 0 0 0 0 0 70
-1.4035 -1.8755 0.0000 O 0 0 0 0 0 71
-3.0918 -3.8502 0.0000 N 0 0 0 0 0 72
1.6146 -0.0598 0.0000 C 0 0 1 0 0 73
6.0587 1.6342 0.0000 O 0 0 0 0 0 92
5.3087 0.3351 0.0000 P 0 0 0 0 0 93
6.7183 -0.1779 0.0000 O 0 0 0 0 0 94
5.0483 -1.1421 0.0000 O 0 0 0 0 0 95
3.8087 0.3351 0.0000 O 0 0 0 0 0 96
3.0587 1.6342 0.0000 P 0 0 0 0 0 97
4.4683 2.1472 0.0000 O 0 0 0 0 0 98
2.7983 3.1114 0.0000 O 0 0 0 0 0 99
1.5587 1.6342 0.0000 O 0 0 0 0 0 100
0.8087 0.3351 0.0000 P 0 0 0 0 0 101
2.2183 -0.1779 0.0000 O 0 0 0 0 0 102
-0.6913 0.3351 0.0000 O 0 0 0 0 0 103
0.5483 -1.1421 0.0000 O 0 0 0 0 0 104
Let's roll back though, what are you actually trying to do? see: http://xyproblem.info/
Thanks for the help, no questions to you anymore.
Hi, when using RDT to map reactions with compounds that have more than 99 atoms, the output .rxn file has corrupted column structure. Specifically, it locates three-digit mapping numbers into preceding column and thus in these rows there are not 16 columns but 15. It makes it impossible to process such output .rxn files automatically.
ECBLAST_R01492_AAM.txt