PistoiaHELM / monomer.org

A repository to track the development of monomer.org site to host universal HELM interpreter and related monomer libraries.
MIT License
1 stars 0 forks source link

Monomers downloaded from monomer.org are not in the correct format #30

Open ClairePA opened 3 years ago

ClairePA commented 3 years ago

Here is an example of the agreed JSON format for monomers.

{ "monomerType": "Backbone", "symbol": "12ddR", "rgroups": [ { "alternateId": "R1-H", "id": 0, "label": "R1", "capGroupSMILES": "[*:1][H]", "capGroupName": "H" }, { "alternateId": "R2-H", "id": 0, "label": "R2", "capGroupSMILES": "[*:2][H]", "capGroupName": "H" } ], "molfile": "\n Marvin 09110915502D \n\n 10 10 0 0 0 0 999 V2000\n -1.4258 10.5012 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.1396 10.0877 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0\n -0.7107 10.0897 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9250 9.2912 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0\n -0.9231 9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n -1.9238 8.4662 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0\n -2.2881 10.9422 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n -2.7596 11.2958 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0\n -3.5846 11.4503 0.0000 R# 0 0 0 0 0 0 0 0 0 0 0 0\n -1.2225 7.9123 0.0000 R# 0 0 0 0 0 0 0 0 0 0 0 0\n 1 2 1 0 0 0 0\n 1 3 1 0 0 0 0\n 2 4 1 0 0 0 0\n 2 7 1 1 0 0 0\n 3 5 1 0 0 0 0\n 4 5 1 0 0 0 0\n 4 6 1 6 0 0 0\n 6 10 1 0 0 0 0\n 7 8 1 0 0 0 0\n 8 9 1 0 0 0 0\nM RGP 2 9 1 10 2\nM END\n\n$$$$\n", "smiles": "[H:1]OC[C@H]1OCC[C@@H]1O[H:2]", "author": "Pistoia Alliance", "name": "1',2'-Di-Deoxy-Ribose", "naturalAnalog": "R", "polymerType": "RNA", "id": 131, "createDate": "Tue Sep 05 17:43:09 CEST 2017" }

The download from monomer.org does not include the agreed array for R groups but flattens it and misses some of the information such as the capgroup SMILEs etc...

{ "monomerversionid": 518, "libraryid": 4, "librarykey": "Nucleotides", "libraryname": "Core Nucleotides", "molfile": "/n ChemDraw11272016482D/n/n 9 9 0 0 0 0 0 0 0 0999 V2000/n -0.9959 0.0314 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0/n -0.9959 -0.7936 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0/n -0.4125 -1.3770 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0/n 0.4125 -1.3770 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0/n 0.9959 -0.7936 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0/n 0.9959 0.0314 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0/n 0.4125 0.6148 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0/n -0.4125 0.6148 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0/n 0.7282 1.3770 0.0000 R1 0 0 0 0 0 0 0 0 0 0 0 0/n 1 2 1 0 /n 2 3 1 0 /n 3 4 1 0 /n 4 5 1 0 /n 5 6 1 0 /n 6 7 1 0 /n 7 8 1 0 /n 8 1 1 0 /n 7 9 1 0 /nM END/n", "smiles": null, "symbol": "ac4C", "naturalanalog": "C", "name": "4-Acetylcytosine", "polymertype": "RNA", "monomertype": "Branch", "status": "draft", "r1": "H", "r2": null, "r3": null, "r4": null, "r5": null, "author": "Bellamy, Claire", "userid": 4, "createddate": 1606251619480, "modifieddate": 1606251619480 }

Please correct the monomer JSON download.

ClairePA commented 3 years ago

Now nothing is downloaded at all!

ClairePA commented 3 years ago

Apologies, it is downloaded, but still in the old format. [{"monomerversionid":7270,"libraryid":23,"libraryname":"Test3","molfile":"Unnamed\nMolEngine04272115472D\n\n 12 12 0 0 1 0 0 0 0 0999 V2000\n 2.3610 2.4680 0.0000 C 0 0 1\n 3.8450 1.9860 0.0000 C 0 0 2\n 4.7620 3.2480 0.0000 C 0 0 1\n 3.8450 4.5100 0.0000 O \n 2.3610 4.0280 0.0000 C 0 0 2\n 1.0990 4.9450 0.0000 C \n 6.3220 3.2480 0.0000 R \n 1.0990 1.5510 0.0000 O \n 4.3270 0.5020 0.0000 O \n 1.2620 6.4960 0.0000 O \n 0.0000 7.4130 0.0000 R \n 1.2620 0.0000 0.0000 C \n 1 2 1\n 2 3 1\n 3 4 1\n 4 5 1\n 5 1 1\n 5 6 1 1\n 3 7 1 1\n 1 8 1 6\n 2 9 1 6\n 6 10 1\n 10 11 1\n 8 12 1\nA 7\nR3\nA 11\nR1\nM END\n","smiles":"[H]OC[C@@H]1[C@@H](OC)[C@@H](O)[C@H]([2OH])O1","symbol":"35mo3r","naturalanalog":"r","name":"3-O-Methylribose (3,5 connectivity)","polymertype":"RNA","monomertype":"Backbone","status":"Active","r1":"H","r2":null,"r3":"OH","r4":null,"r5":null,"author":"Bellamy, Claire","userid":2,"createddate":1619481600000,"modifieddate":1619481600000}]

ClairePA commented 3 years ago

You might want to look at the schema definition on GitHub https://github.com/PistoiaHELM/HELMMonomerSets/blob/master/HELMmonomerSchema.json The issue is the R group information which should be nested like this. "rgroups": [ { "alternateId": "R1-H", "id": 0, "label": "R1", "capGroupSMILES": "[*:1][H]", "capGroupName": "H" }, { "alternateId": "R2-H", "id": 0, "label": "R2", "capGroupSMILES": "[*:2][H]", "capGroupName": "H" } ]

and not listed like you have. "r1":"H","r2":null,"r3":"OH","r4":null,"r5":null, I can talk you through it if you let me know your availability.

ClairePA commented 3 years ago

Would highly recommend that this is addressed as the limitations of the current approach will be apparent should other polymer types be implemented in HELM.