geneontology / noctua

Graph-based modeling environment for biology, including prototype editor and services
http://noctua.geneontology.org/
BSD 3-Clause "New" or "Revised" License
36 stars 13 forks source link

Display of modified forms #880

Open ValWood opened 2 months ago

ValWood commented 2 months ago

In this model

rec8 protection Screenshot 2024-04-23 at 16 45 55

The phosphorylation status of rec8 is important (in this case (Phos:(S450),UnPhos(S412)) In PRO this has a short label PRO-Short-label: EXACT: Spom-rec8/Phos:2

Because most of our models will likey have modified forms and often multiple different modified forms (i.e acting as regulatory switches with different outcomes), I wonder if it would be possible to display the

"Phos:2" part of the PRO label on the noctua entity (currently only displays rec8 Spom whichever form is selected). This will help us to navigate the different forms in the model.

I was really impressed that I could select and use modified forms!

pgaudet commented 2 months ago

I think this is driven by the information provided in your GPI file -- @kltm Is this right?

kltm commented 2 months ago

@pgaudet If I'm following and am guessing right about what's happening, the answer is "yes".

pgaudet commented 2 months ago

@ValWood

The PRO information is NOT in your GPI file (https://www.pombase.org/data/annotations/Gene_ontology/pombase.gpi.gz ) the line for that ID (PR:000050512) is:

PR:000050512 rec8 meiotic cohesin complex kleisin subunit Rec8 SO:0001217 NCBITaxon:4896 PomBase:SPBC29A10.14 PomBase:SPBC29A10.14.1.pep UniProtKB:P36626 go-annotation-summary=meiotic cohesin complex kleisin subunit Rec8

Howveer based on the PRO entry, it looks like you should put the PRO-Short-label: | EXACT in column 2 of your GPI:


https://proconsortium.org/cgi-bin/entry_pro?id=PR%3A000050512&retrieve.x=11&retrieve.y=14

image

Is this possible?

ValWood commented 1 month ago

We can do that. But we also have a display label which makes more sense biologically. Can we include this label for display in Noctua? The Pro-lables number in numerical order rec8/Phos:1, rec8/Phos:2, Rec8Phos3 etc. So for this modified form we display ([Phos:(S450),UnPhos(S412)

This can get very confusing and isn't very meaningful to the users. Especially if there are many modified forms like rpb1 https://www.pombase.org/gene/SPBC28F2.12

@kimrutherford can you look into getting these 2 labels into the GPI?

pgaudet commented 1 month ago

AFAIK what is is column 2 of the GPI gets displayed, so you'd need to make sure the information you want is there.

ValWood commented 1 month ago

Great thanks. Are Complex Portal entities in Noctua read from the GPI file? We were discussing this morning and decided they probably were, but I have a note to check...

vanaukenk commented 1 month ago

@ValWood Yes, Complex Portal entities are also read from the GPI file. SGD has some of those in their file, for example. Basically, any entity that you want to have available for annotation in Noctua needs to be in the PomBase gpi file and, as Pascale said, what you want for a display label needs to be in Column 2, at least for now. One thing we might want to consider for a future gpi format, though, is having a specific column for 'Display name' in case that is different from any of the other systematic names that exist which I think we would still want to capture.

ValWood commented 1 month ago

Hi @vanaukenk - So do we see these IDs in Noctua after the next GO update? or is this on a different cycle?

vanaukenk commented 1 month ago

@ValWood New entities in your gpi file are available for curation with each Noctua maintenance outage. The next one is tomorrow, May 9th, if you want to check.

ValWood commented 1 month ago

great!

ValWood commented 1 month ago

I can't locate a complex that is in our GPI file CPX-555

kltm commented 1 month ago

@ValWood how long ago did you add that to your file? If it does not show up after today's update, let's look into it.

kltm commented 1 month ago

@ValWood I'm guessing that it was added during this last period. It now seems to be available: http://noctua-amigo.berkeleybop.org/amigo/term/ComplexPortal:CPX-555 .

vanaukenk commented 1 month ago

@ValWood I've checked on the three Noctua workbenches and can autocomplete on CPX-555 after this maintenance outage. Please let us know if you can't find it.

Note that the maintenance outages happen from ~4-6pm PDT on the Thursdays when we have them. We put the outages on the GO calendar, too, (and send the email notice out), just in case you were unclear about the timing of the updates.

Thx.

ValWood commented 1 month ago

Oddly I can't find it:

Screenshot 2024-05-10 at 09 38 54

the entire list had only SGD IDS. If I remove digits to autocomplete I get other entities but not this one.

ValWood commented 1 month ago

ignore me, I get it, I will check on Friday.

ValWood commented 1 month ago

...It is Friday.......

ValWood commented 1 month ago

We had another look, we can find the complex when searching for entities in the activity unit, but not in the "protein complex" tool. MAybe we have something wrong in our GPI file?

ValWood commented 1 month ago

MAybe we used the incorrect term for "complex" and is defaulting to gene? (GPI v20

vanaukenk commented 1 month ago

@ValWood - let me do some systematic testing/investigating to see if I can understand why it's not showing up in the Protein Complex Form part of the VPE.

I'll probably also move this issue to a separate ticket in the VPE tracker, but will link it here.

vanaukenk commented 1 month ago

@ValWood - this does indeed seem to be an issue with how the S. pombe protein-containing complexes are being typed in Noctua, although you've used the correct type (GO:0032991) for the gpi2.0 file format.

Looking at the entries for an S. pombe vs S. cerevisiae protein-containing complex in noctua-amigo I see the parentage differences:

image

image

The difference between the S. pombe and S. cerevisiae gpi files is the format; SGD is still using gpi1.2.

@kltm @balhoff - could the incorrect typing of S. pombe complexes in NEO be the result of a different input file format?

ValWood commented 1 month ago

not that this isn't critical for us right now if you are busy, we can manage without complexes in the short term but we want to keep moving with this it so we can power through later. ...of course if it is at our end we will prioritise a fix.

vanaukenk commented 1 month ago

@ValWood - can you still make the annotations to complexes that you need using the gene product field in the Activity Unit interface?

Let me know if you want to conference.

kltm commented 1 month ago

The logic for GPI seems to be:

default:  'CHEBI:33695 ! information biomacromolecule';
if 'protein': 'CHEBI:36080 ! protein';
if 'transcript': 'CHEBI:33697 ! ribonucleic acid';
if 'protein_complex': 'GO:0032991 ! macromolecular complex';

The same is mostly true in the GAF processor, except the if 'protein': 'CHEBI:36080 ! protein'; bit is removed with the comment: # note some groups incorrectly classify their genes as proteins

vanaukenk commented 1 month ago

@kltm - so, if a group's input file is gpi2.0 and they're using GO:0032991 for protein-containing complex type, does this mean the type field is ignored and everything defaults to 'information biomacromolecule'?

kltm commented 1 month ago

@vanaukenk Well, assuming there are no bugs elsewhere, the logic is basically to pull the "type" info from a column in the line and match as outlined above. I'm not quite sure what you mean by "using GO:0032991". If you want to hop on a Zoom, we can sort this out real quick.

kltm commented 1 month ago

In discussion with @vanaukenk we have modified the parser to follow the GPI 2.0 spec a little better. Re-running load to see if there are improvements.

vanaukenk commented 1 month ago

@ValWood Following up on testing for S. pombe complexes in the VPE (and elsewhere). I checked all of the different workbenches on production Noctua and things look okay. CPX-555 (and other pombe complexes) are now available in the Protein Complex widget of the the VPE and in the other gene product autocompletes, so I think we're good. Please double-check and let us know if this is working as expected for you now.

@kltm

ValWood commented 1 month ago

Yes that works, thanks!