asad / ReactionDecoder

Reaction Decoder Tool (RDT) - Atom Atom Mapping Tool
GNU Lesser General Public License v3.0
76 stars 24 forks source link

three-digit mapping numbers disrupt the column structure in the output #13

Closed anastasiiaNG closed 4 years ago

anastasiiaNG commented 4 years ago

Hi, when using RDT to map reactions with compounds that have more than 99 atoms, the output .rxn file has corrupted column structure. Specifically, it locates three-digit mapping numbers into preceding column and thus in these rows there are not 16 columns but 15. It makes it impossible to process such output .rxn files automatically. Screenshot from 2020-02-20 16-31-23

ECBLAST_R01492_AAM.txt

johnmay commented 4 years ago

Is the ECBLAST_R01492_AAM.txt the output? That looks correct, it's fixed width not a TSV you can't open it in Excel which it looks like you've done.

johnmay commented 4 years ago

Please see bottom of Page 40: "The Atom Block" http://help.accelrysonline.com/ulm/onelab/1.0/content/ulm_pdfs/direct/reference/ctfileformats2016.pdf

anastasiiaNG commented 4 years ago

Ok, I see, thank you.

Probably you could advise me is there any way to automatically extract these atomblocks as separate R/python objects with atom mapping numbers always in the same column?

I try to read ECBLAST_R01492_AAM.txt automatically by the means of R package ChemmineR and even this specialized library fails to read such files correctly. Below is how ChemmineR interprets this file:

ttt <- ChemmineR::read.SDFindex("ECBLAST_R01492_AAM.txt", 
                                index=data.frame(A=1, 
                                                 B=length(readr::read_lines("ECBLAST_R01492_AAM.txt"))))
ChemmineR::atomblock(ttt)

Screenshot from 2020-02-21 17-44-58

johnmay commented 4 years ago

Then this looks like it might be a bug in ChemmieR... RDKit/CDK will handle it correctly but you will have to manually by iterating over the atoms.

Because it's a fix width format (dates back to fortran days) you can just grabs the chars you need using simple command line utilities.

 $ grep -F '.' ~/Downloads/ECBLAST_R01492_AAM.txt | cut -c1-49,61-63
...
  -18.8182   -5.0468    0.0000 C   0  0  0  0  0  61
  -17.7257   -4.0189    0.0000 C   0  0  0  0  0  62
  -18.0704   -2.5582    0.0000 C   0  0  0  0  0  63
  -19.5074   -2.1254    0.0000 C   0  0  0  0  0  64
  -16.2893   -4.4509    0.0000 C   0  0  0  0  0  65
  -18.4745   -6.5069    0.0000 C   0  0  0  0  0  66
  -12.6166    0.2957    0.0000 C   0  0  0  0  0  67
   -3.6837    0.0740    0.0000 C   0  0  1  0  0  68
   -3.9578   -1.4007    0.0000 C   0  0  0  0  0  69
   -2.8177   -2.3755    0.0000 C   0  0  0  0  0  70
   -1.4035   -1.8755    0.0000 O   0  0  0  0  0  71
   -3.0918   -3.8502    0.0000 N   0  0  0  0  0  72
    1.6146   -0.0598    0.0000 C   0  0  1  0  0  73
    6.0587    1.6342    0.0000 O   0  0  0  0  0  92
    5.3087    0.3351    0.0000 P   0  0  0  0  0  93
    6.7183   -0.1779    0.0000 O   0  0  0  0  0  94
    5.0483   -1.1421    0.0000 O   0  0  0  0  0  95
    3.8087    0.3351    0.0000 O   0  0  0  0  0  96
    3.0587    1.6342    0.0000 P   0  0  0  0  0  97
    4.4683    2.1472    0.0000 O   0  0  0  0  0  98
    2.7983    3.1114    0.0000 O   0  0  0  0  0  99
    1.5587    1.6342    0.0000 O   0  0  0  0  0 100
    0.8087    0.3351    0.0000 P   0  0  0  0  0 101
    2.2183   -0.1779    0.0000 O   0  0  0  0  0 102
   -0.6913    0.3351    0.0000 O   0  0  0  0  0 103
    0.5483   -1.1421    0.0000 O   0  0  0  0  0 104

Let's roll back though, what are you actually trying to do? see: http://xyproblem.info/

anastasiiaNG commented 4 years ago

Thanks for the help, no questions to you anymore.