Closed cmungall closed 7 years ago
Any relation to #48?
Are "addition templates" the names of the widgets on the left of the graph view?
Yes, there are certainly relationships between the ATs and the forms in the phenote-style view. A few differences
Nevertheless it would be great to reuse. The AT and the form could be driven by the same yaml
"Addition templates", in that formulation, would be the things on the right, pop-ups, and separate pages/entry points.
For the first point, true enough, but writing or not writing values should be easily done. For the second, there is an AT of sorts in there for adding evidence on individuals and models (annotation template).
I think the big difference from this end for me is that the ATs were more envisioned to work by adding parts to a model (your point one), while the phenote-style is more meant to embody the whole of a model, which can make a fair amount of difference in some of the plumbing.
Let's do whiteboard
cc @DoctorBud
Regarding the structures built for G2P associations, (genotype/variant to phenotype/disease), it seems that a key difference here is that the subject that a user specifies is a variant or genotype individual - not a type/class where the system generates the individual IRI automatically. For example, in DIPper models alleles such as shha
A possible example:
The current WebPhenote only implements a simple disease-phenotype-onset (DPO) relationship. This ticket is about generalizing that, and I have a question about encoding some of the HPO fields not currently present in WebPhenote. After talking to @pnrobinson , I'm looking at some of the HPO annotation data (e.g., https://github.com/monarch-initiative/hpo-annotation-data/blob/75d3e390f795a022c106cfa24cd27e669e371aba/rare-diseases/annotated/OMIM-614455.tab) and am wondering how all of the extra data should be associated.
Looking at the columns in the above file, and marking those that webphenote already handles with a DPO:
and then eliminating the (DPO) entries leaves us with:
Should these additional fields be encoded as SubjExt or ObjExt? I don't think @pnrobinson has access to this repo because his name doesn't autocomplete when I start typing.
I believe many of these are not required. Can you cut -fN on OMIM-*tab in the repo above and see which are used?
I wrote the following script to analyze all the *.tab files in a directory and applied it to rare-diseases/annotated/
and it showed me the following results, indicating all columns are used:
~/MI/hpo-annotation-data/rare-diseases/annotated master$ ./test.sh
83854 /tmp/col_DiseaseID.txt
83854 /tmp/col_DiseaseName.txt
5566 /tmp/col_GeneID.txt
5566 /tmp/col_GeneName.txt
2503 /tmp/col_Genotype.txt
9354 /tmp/col_GeneSymbols.txt
83854 /tmp/col_PhenotypeID.txt
83842 /tmp/col_PhenotypeName.txt
481 /tmp/col_AgeofOnsetID.txt
481 /tmp/col_AgeofOnsetName.txt
83677 /tmp/col_EvidenceID.txt
83648 /tmp/col_EvidenceName.txt
6419 /tmp/col_Frequency.txt
69 /tmp/col_SexID.txt
69 /tmp/col_SexName.txt
725 /tmp/col_NegationID.txt
770 /tmp/col_NegationName.txt
25446 /tmp/col_Description.txt
79307 /tmp/col_Pub.txt
79473 /tmp/col_Assignedby.txt
83733 /tmp/col_DateCreated.txt
Here is the script:
#!/bin/bash
cols=( \
"Disease ID" \
"Disease Name" \
"Gene ID" \
"Gene Name" \
"Genotype" \
"Gene Symbol(s)" \
"Phenotype ID" \
"Phenotype Name" \
"Age of Onset ID" \
"Age of Onset Name" \
"Evidence ID" \
"Evidence Name" \
"Frequency" \
"Sex ID" \
"Sex Name" \
"Negation ID" \
"Negation Name" \
"Description" \
"Pub" \
"Assigned by" \
"Date Created")
for i in `seq 0 20`
do
col=$((i+1))
name=`echo "${cols[$i]}" | tr -d " ()"`
fname=/tmp/col_$i_$name.txt
tail -q -n+2 *.tab | cut -f$col | awk NF > $fname
wc -l $fname
done
So from the list:
0) Assume we will be supporting an HPO configuration that is a variant of the current DPO (DiseasePhenotypeOnset) configuration.
1) I think we can assume the Gene{ID | Name | Symbol(s)} can be replaced with a Gene Autocomplete field (Noctua probably uses this all the time).
2) Does Genotype need to be entered separately? Or can the Gene Autocomplete provide this?
3) Sex ID/Name can be replaced with an autocomplete or a selector from a finite list (ideally, keyboard navigable)
4) Negation ID/Name can be handled similarly to (3) above
So basically, 3 new columns need to be added at the input/display end. If we wanted to export .tsv
files similar to the stuff in hpo-annotation-data, then we'd generate the redundant columns Gene{ID | Name | Symbol(s)} from the selected GeneID.
How does this sound, @cmungall @kltm @jmcmurry @pnrobinson?
Hi Dan, thanks! I think we also need an onset modifier. We are also starting to move towards a more expressive use of modifiers in general (i.e., laterality, severity, triggered-by). The GUI does not need to show the Negation or the Sex ID values, as these are obviously redundant given the terms. But in general, yes, this is going in the right direction
This has been implemented as a workbench and has its own repo: https://github.com/geneontology/simple-annoton-editor/
This subsumes #9 and is a simplified version #128, and takes priority over it for the time being.
Bakground: we have need of a variety of form based interfaces, see for example https://github.com/monarch-initiative/monarch-phenote/issues?q=is%3Aopen+is%3Aissue+label%3Aforms plus the basic GO style forms
Issue #128 describes a very flexible and powerful way of declaratively specifying a mapping between a form (ie set of variable bindings, or a denormalized row) and a graph (which forms a subgraph of the overall model).
This may be overkill for some purposes, as the majority of forms we need will fit into a common structure, with different template configurations. The monarch disease-phenotype form already fits this structure. It can be genericized by allowing the form to be configured for the classes and relations whilst retaining identical structure.
The core structure is an annotated edge between a subject and object; for example, disease to phenotype, allele to phenotype, gene to function. Optionally the object instance can be adorned with an edge of a different type; for example, the phenotype instance can be extended with an onset instance, or the function instance can be extended with a location (occurs in).
For example, the current monarch d2p form has the following variables for each row:
This could easily be genericized by not hardcoding the class 'category'. E.g. if the form could be configured by a YAML with the following config params:
(possibly also SubjExt)
As in here:
A GO example:
The wrinkle here is that we conventionally assert the inverse, so perhaps an additional param for this