MobleyLab / benchmarksets

Benchmark sets for binding free energy calculations: Perpetual review paper, discussion, datasets, and standards
BSD 3-Clause "New" or "Revised" License
42 stars 16 forks source link

Add machine-readable tables for all the sets #67

Open slochower opened 6 years ago

slochower commented 6 years ago

Can we add Markdown tables for the CB7 and GDCC sets like we have for the CD sets? Those are really helpful.

I realize this information is in the manuscript itself, but when setting up calculations on the entire set of systems, it's way easier to use the Markdown (or just csv) tables than the PDF. For example, for each file I'm processing, I can fairly easily write a function to parse the tables and return host, guest, and store the experimental binding affinity for later analysis. Even better would be to list host and guest residue names, along with charge (although I should be able to get that from the SMILES without too much difficulty, but having the charge listed directly would avoid dependencies on e.g., OpenEye or other chemistry-parsing code here and help ensure everyone starts with the same exact state).

I also realize I could submit a PR myself -- and it's on my to-do list -- but by listing it here, someone might take a stab at it before it surfaces to the top for me.

slochower commented 6 years ago

Related to (but maybe more specific than) #49 and #1.

davidlmobley commented 6 years ago

Agreed, yes, we should do this.

I think there is a tool which perhaps could convert, I believe pandoc?

Otherwise maybe CB7 is within @nhenriksen 's interests; GDCC would probably fall to me.

slochower commented 6 years ago

I think you are right, although there are differences between the manuscript table and the ones in the READMEs. I understand this may be because you (?) did not want to bias the calculations by suggesting a particular charge state (I think...). In any case, I appreciate the READMEs in the CD set because the charge states and SMILES strings listed there conform exactly to what I see in the files (at least so far), and I can use that as a sanity check or additional filter while doing the force field conversions. Anyway, this is not a sticking point for me, just an observation!