PistoiaHELM / HELMMonomerSets

MIT License
6 stars 5 forks source link

Inconsistent case on rgroup SMILES #6

Open cing opened 1 year ago

cing commented 1 year ago

The field header "capGroupSmiles" has SMILES in all-caps for about 50% of records in the Core Library. This makes parsing problematic, and even adds extra columns to your XLSX convenience file,

> grep "capGroupSmiles" HELMCoreLibrary.json | head -n 5
                "capGroupSmiles": "[*:1][H]", 
                "capGroupSmiles": "O[*:2]", 
                "capGroupSmiles": "[*:1][H]", 
                "capGroupSmiles": "O[*:2]", 
                "capGroupSmiles": "[*:3][H]", 
> grep "capGroupSMILES" HELMCoreLibrary.json | head -n 5        
                "capGroupSMILES": "[*:1][H]", 
                "capGroupSMILES": "[*:2][H]", 
                "capGroupSMILES": "[*:1][H]", 
                "capGroupSMILES": "[*:2][H]", 
                "capGroupSMILES": "[*:1][H]", 
> grep "capGroupSmiles" HELMCoreLibrary.json | wc -l
594
> grep "capGroupSMILES" HELMCoreLibrary.json | wc -l
797

This is fixed by sticking with a single convention using search and replace, presumably it should be capGroupSMILES since that's used consistently in monomerLib2.0 and Ionis.

sellersb commented 1 week ago

I ran into this too. The json schema in this repo does not validate the provided json