OpenBioML / chemnlp

ChemNLP project
MIT License
148 stars 46 forks source link

complete train/test split routine #502

Closed kjappelbaum closed 10 months ago

kjappelbaum commented 10 months ago

Updates:

Also datasets that are more difficult to parse now do something meaningful

Screenshot 2023-11-21 at 01 19 05

zinc has been causing issues for MicPie, also works

image

the LocalCluster from one of the commits here is not strictly needed, but it helps with debugging.

couldn't test on HPC as I still get Quota issues after deleting many files.

need to verify that we have all dependencies: openpyxl, pymatgen, givemeconformer, dask

kjappelbaum commented 10 months ago
Screenshot 2023-11-21 at 02 07 19

remaining ran through

kjappelbaum commented 10 months ago

Still waiting for rdkit in the SMILES split, but otherwise this seems to run

kjappelbaum commented 10 months ago

one we need to check a bit more carefully is odd_one_out.

kjappelbaum commented 10 months ago

Since I need to redownload, I'll call it a day now while it continues running

kjappelbaum commented 10 months ago

odd one out also ran successfully, Iupac names is another large and slow dataset