broadinstitute / monorepo

Compendium of tools for the Imaging Platform
9 stars 1 forks source link

Relationship to jump-cellpainting/compound-annotator #22

Closed shntnu closed 7 months ago

shntnu commented 8 months ago

@afermg How is https://github.com/jump-cellpainting/compound-annotator related to https://github.com/broadinstitute/monorepo/tree/main/libs/jump_compound_annotator?

If we remove StandardizeMolecule.py from jump-cellpainting/compound-annotator, would the repo be completely superseded by jump_compound_annotator? If so, we should pull out StandardizeMolecule.py into a new repo and archive compound-annotator ASAP.

Here's why I think it is superseded

  1. drugrep.py is probably doing what I attempted in repurposing-annotations.ipynb
  2. I am guessing some scripts in jump_compound_annotator are pulling from ChEMBL, and in that case, it would supersede the process for getting annotations from ChEMBL documented in the README

@johnarevalo – also just making sure that jump_compound_annotator is the only code base you are using for generating your annotations, correct?

afermg commented 7 months ago

You are right in that this will superseed https://github.com/jump-cellpainting/compound-annotator, namely @johnarevalo's branch containing the resource fetching. I still think there will be work to do to integrate John's approach to fetching all resources and that scripts' decision-making and @srijitseal's SMILES standardisation (which is being integrated on PR #23)

afermg commented 7 months ago

I'll close the issue because issues on the monorepo are supposed to be software-based, not project-based :)

shntnu commented 7 months ago

You are right in that this will supersede https://github.com/jump-cellpainting/compound-annotator, namely @johnarevalo's branch containing the resource fetching.

Ok; I will go ahead and archive that repo now

I still think there will be work to do to integrate John's approach to fetching all resources and that scripts' decision-making

I didn't understand. Can you elaborate?

afermg commented 7 months ago

Ideally, we want to reuse John's code to fetch data from the original sources, and then use Srijit's to process it to get the final names. At the moment they are two independent entities and I am unsure as to where @srijitseal got his raw data from. That's worth having in code somewhere.

shntnu commented 7 months ago

Ideally, we want to reuse John's code to fetch data from the original sources, and then use Srijit's to process it to get the final names. At the moment they are two independent entities and I am unsure as to where @srijitseal got his raw data from. That's worth having in code somewhere.

You referring to annotations, and not SMILES, right? I am not sure what you are referring to as "final names"

Could you please create a new issue to capture what needs to be done? If you are not sure about details, please tag John or Srijit to fill that in.

Overall I 100% agree that we should have thing captured in code.

afermg commented 7 months ago

Yes, but John has to do some processing to merge the tables. By final names I mean the "standardized" SMILES. I just mean to say that while these are two different goals, they should share the same input data.

afermg commented 7 months ago

Created issue #24 to get that done.