Open dustine32 opened 4 years ago
Rats! Still over 100. Some of these should be cleaned up with the next MGI GPAD release and some still look like they should be passing to me.
@ukemi Here's this week's MGI report from the 2019-11-13 upstream GPAD with the validation rule changes from above: https://docs.google.com/spreadsheets/d/1Iw_9gFZTRvKyJ3RU8Q6X51wSE6FQGZ8B2Sb_mvs9DC4/edit#gid=0
@ukemi Here's this week's MGI report from the 2019-11-19 upstream GPAD with the validation rule changes from above: https://docs.google.com/spreadsheets/d/1reWzNHOb4E3rs2QWrVqDxOqyEb7OYB-H-TCCQanKnnc/edit#gid=0
Happy dance! Less than 100. Let's look at the remainders tomorrow. @vanaukenk, woo hooo!
@ukemi @vanaukenk Just realized something with the adjacent_to
rule for extracellular region (GO:0005576):
https://github.com/geneontology/gocamgen/blob/c1d5724e52cc0efdcbea742bc4317ed6822581fd/resources/formatted_ext_patterns.tsv#L48
Regarding the MGI annotations to extracellular space GO:0005615 and extracellular matrix GO:0031012, both of these terms are descendants of extracellular region via the part of
relation. As it turns out I'm only checking is a
descendants:
I don't know if we specifically discussed this but should I include the part of
relation when checking primary term descendants? I could see this causing issues with the MF-part_of->BP bridge though would need to do some testing to confirm. Does ShEx follow part_of
paths?
My gut feeling is that even if it works for the cases we have enumerated it is not universally true and will open a can of worms. For example let's hypothetically say that there is a cellular component that is a membrane-bound cytosolic vesicle and consists of a membrane that completely surround a lumen. Both the membrane and the lumen would be parts of the vesicle, and it would be true to say that the vesicle and it's membrane are adjacent to the cytosol, but it would be false to say that the lumen is adjacent to the cytosol. I am very uncomfortable making rules that might not always be true. I'd rather be safe and assert only what we know. @vanaukenk ?
It's early, but thinking about this more. It seems like these types of issues would best be considered by thinking about rigid property chains. In this case part_of-o-adjacent_to -> adjacent_to is not valid so we wouldn't propagate.
Good catch @dustine32 I agree with @ukemi : for now, we need to be conservative and just use the is_a hierarchy. The ShEx is only following part_of in BP, i.e. only a BP can be part of another BP. I'll take a look at the existing MGI and WB annotations to see what terms we need to add for 'adjacent to' for now, but we will need to flesh this out more in the future.
I just fixed all the annotations that I think were problematic at the annotation-level. Will the next round yield a blank spreadsheet?
@dustine32 @ukemi
For 'adjacent to', the only CC terms for which WB and MGI have direct annotations or annotations to an is_a child (according to AmiGO) are, so let's go with this for now.
extracellular region (GO:0005576) extracellular space (GO:0005615) extracellular matrix (GO:0031012)
I am switching the last three WB annotations that have 'part of' extensions with these terms (or is_a children) to 'adjacent to'.
Actually, I just realized that one of these 'part of' extensions is coming from a GO-CAM model and the enables_o_occurs_in -> part_of property chain.
I will leave that annotation alone for now but we will want to make sure we have a rule in place to get the desired 'adjacent to' extensions back out of our models for the appropriate CC terms.
@ukemi Working on the ShEx shapes I have a question about the second 'results in specification of' entry in the tsv. The term associated with 'results in specification of' EMAPA,UBERON,WBbt is 'regulation of cell maturation' which I think might be a mistake. For this pair, I propose using 'pattern specification process' (GO:0007389). What do you think?
Looks like in the ontology we have used it for 'specification of x organ identity' as well. I think we should revisit this. It was originally intended for cell fate.
Cell maturation is definitely incorrect.
Here's the definition of 'results in specification of':
"The relationship linking a cell and its participation in a process that results in the fate of the cell being specified. Once specification has taken place, a cell will be committed to differentiate down a specific pathway if left in its normal environment."
So, yes, we would either want to update the relation def or come up with a new relation for the 'specification of x organ identity' terms.
I think we should keep the definition consistent with its original intent.
Sounds good.
Looking through the MGI and WB annotations again, though, I'm not convinced we need that second line for 'results in specification of'.
And maybe we do want a check that this relation was only used with cell?
@ukemi - if you agree, I'll delete that line from the tsv
Maybe my spreadsheet won't be blank.
@vanaukenk @ukemi OK, I agree to just explicitly list the part_of
-related terms in the adjacent_to
rule rather than open up the code to globally traverse the part_of
paths.
extracellular region (GO:0005576) extracellular space (GO:0005615) extracellular matrix (GO:0031012)
I can add the missing GO:0005615 and GO:0031012 to the TSV under branch issue-68-valid-exts: https://github.com/geneontology/gocamgen/blob/c1d5724e52cc0efdcbea742bc4317ed6822581fd/resources/formatted_ext_patterns.tsv#L48
@vanaukenk @ukemi I made the above change for adjacent_to
and merged this issue's branch into master
. Since I have a new batch of rule changes to make from yesterday's call, I'm going to close this issue and make the changes under https://github.com/geneontology/gocamgen/issues/73.
But feel free to re-open this and/or continue the conversation here!
Some more rule changes from the 2019-11-07 whole genome imports call:
has_part(geneID)
for primary term protein-containing complex (GO:0032991) and descendantsCHEBI
forregulates_level_of
regulates_o_acts_on_population_of(CL)
for primary terms homeostasis of number of cells (GO:0048872), maintenance of cell number (GO:0098727) and descendantsadjacent_to
change primary term to extracellular region (GO:0005576)causally_upstream_of
relations according to ShEx