Homogenise chemical sources

afermg commented 7 months ago

Branching off #23, @shntnu and I are thinking on how to make jump-compound-annotator and smiles (see previous issue) to play along. The idea is to have a single source for all of chemical compounds (@johnarevalo) -- both recording their URL (github) and data (Zenodo?) -- to make the full processing pipeline reproducible (using a default seed). The main goal is to get one list of compounds which we can filter downstream, for both chemical annotations and SMILE standardisation.

My general proposal is to use our "canonical" compounds list, use it to get as many annotations as possible using John's code and finally process these compounds using Srijit's standardisation code. Everything is kind of done now (see links above), it only needs to be assembled into a single pipeline.

afermg commented 2 months ago

This still depends on #23

afermg commented 2 days ago

Do we have a canonical set of chemical names/inChikeys? @shntnu @srijitseal

broadinstitute / monorepo

Homogenise chemical sources #24