geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

First step: Import Reactome Pathways for MOD species #298

Open ukemi opened 11 months ago

ukemi commented 11 months ago

This step is to provide a ground state assessment of what pathways for orthology projections look like that are imported from Reactome. Decisions to be made:

  • [ ] Results of the import

    Pathways will be imported as development models Shex and logical checks will be run on pathways Evidence on pathways will be left blank and filled in during the second step when possible We will explore other ways to capture evidence

ukemi commented 11 months ago
  1. Because when we were importing individual pathways at the NYC meeting we lost information, we would like to import the whole pathway space for the organism.
  2. Projected models for all Alliance species. We will closely review fly, worm and mouse.
  3. @dustine32 @kltm can the underlying database handle all of these new models? There are 1910 pathways imported from Reactome. At most if we import all of the Alliance organisms, it will be 1910 x 8= 15280.
  4. Set up a time frame for when we can do the first run of this. This initial import will be for a high-level view of they way things stand right now.
  5. How will these models be used? Cloned or edited? We need to put together some SOPs for the use of these models. What will people use these models for? Duplication will effect over-representation results.
kltm commented 11 months ago

@ukemi Re "3". It can be tested, but a ~1/3 increase would be expected to slow things down a bit. With ongoing infrastructure work and other imports going on, I think we need to clearly spell out what the overall growth is expected to be over the next year or so, before giving commitments. As well, I'd like to understand the timeframe that this conversation is occurring in.

deustp01 commented 11 months ago

1910 x 8= 15280

... is an upper bound. For now, 3 species (worm, mouse, fly) comes with an upper bound of 5,730. And if we can find a way of building GO-CAMs for selected pathways with no information loss (rather than having to build a complete GO-CAM set - @dustine32 can explain this better) that may reduce the bound further and slow the growth in numbers of GO-CAMs.

At the same time, if the Alliance wants integrated (i.e., GO-CAM / Noctua rather than classic atomic GO) annotation and wants to exploit shared biology between model organisms, something like this project with something like this demand on resources seems necessary.

thomaspd commented 11 months ago

I think this should not be a priority right now. It would make much more sense to finish the conversion of the native (i.e. human) Reactome pathways first: we know there's a lot of work still to be done there, so we know all of the remaining issues will be found in the non-human projected pathways as well.

If this ticket is driven by a specific request from one or more MODs, we should have some larger meetings (not just the Reactome2GO team) to discuss how and when it would make sense to work on it.

deustp01 commented 11 months ago

I think this should not be a priority right now.

For a large-scale systematic effort, I agree. For pilot work to figure out procedures and strategies, including careful thinking about the resources to do this on a large scale, we need to be at work now. In the specific case of resources / infrastructure, with budgeting done in 5-year chunks, we are stuck planning now for usage several years from now. My own view of tactics is that it makes sense to start with a subgroup of people who are committed to pathway annotation and comfortable, or willing to become comfortable, with the GO-CAM / Noctua curation environment. That work, at that scale, is already underway and looks promising.

It makes no sense to me to tell a potential user like Steven Marygold / FlyBase that he should wait for an indefinite period before starting to use the fairly complete models we can already generate for metabolism, because the whole project must proceed as a monolith, so nothing moves until complexities of signaling and cell cycle progression are definitively sorted out.

A key goal of this initial phase if the development of templates and annotation strategies that others will be comfortable using. It has not escaped our notice that such templates might be a useful way to overcome general reluctance to adopt GO-CAM / Noctua as a human-friendly curation environment.

I'm speaking in part as the PI of the pathways2GO U24 project that will end (really end, not just pause pending possible renewal) on June 30, 2026, at which time work will be continued by the participating organizations per the language of the grant application.

We can discuss this further on Wednesday.

thomaspd commented 11 months ago

Yes, we can discuss on Wednesday.

If Steven's needs are driving this prioritization decision, then I think we should have that discussion with him and include a larger group. My understanding after talking to him recently was that they are currently looking for a curator for fly metabolic pathway GO-CAMs, and he will reach out when they have hired someone.

I agree that this does not have to be monolithic or indefinite in time frame, but there are remaining items to address even for metabolic pathway conversion to GO-CAM that we should finish first, to make sure we are maximally efficient with curator time, even for a pilot.