MobleyLab / FreeSolv

Experimental and calculated small molecule hydration free energies
http://www.escholarship.org/uc/item/6sd403pz
101 stars 53 forks source link

Remove duplicate from database, update DOIs and other info #41

Closed davidlmobley closed 7 years ago

davidlmobley commented 7 years ago

This resolves #40 by removing a duplicate molecule from the database. To prevent related issues in the future, functionality is added to allow easy checking of duplicates.

(It turns out that to ensure all SMILES are canonicalized to allow adequate checks for duplicates, if the primary data is SMILES strings, it's necessary to go SMILES -> OEMol -> canonical isomeric SMILES and then cross-check the canonical isomeric SMILES in all cases; this procedure hadn't been done in exactly this way before which allowed this one duplicate to sneak through.)

This also makes additional other minor changes:

davidlmobley commented 7 years ago

(I'll also add a separate issue: I think it's time we set up automated testing, and one of the things it should do -- aside from checking that the database can be extracted, etc. -- is check for duplicates.)