Create a configurable statically structured association editing form

cmungall commented 9 years ago

This subsumes #9 and is a simplified version #128, and takes priority over it for the time being.

Bakground: we have need of a variety of form based interfaces, see for example https://github.com/monarch-initiative/monarch-phenote/issues?q=is%3Aopen+is%3Aissue+label%3Aforms plus the basic GO style forms

Issue #128 describes a very flexible and powerful way of declaratively specifying a mapping between a form (ie set of variable bindings, or a denormalized row) and a graph (which forms a subgraph of the overall model).

This may be overkill for some purposes, as the majority of forms we need will fit into a common structure, with different template configurations. The monarch disease-phenotype form already fits this structure. It can be genericized by allowing the form to be configured for the classes and relations whilst retaining identical structure.

The core structure is an annotated edge between a subject and object; for example, disease to phenotype, allele to phenotype, gene to function. Optionally the object instance can be adorned with an edge of a different type; for example, the phenotype instance can be extended with an onset instance, or the function instance can be extended with a location (occurs in).

For example, the current monarch d2p form has the following variables for each row:

?DiseaseClass
?PhenotypeClass
?OnsetClass -- optional
(?EvidenceClass, ?PubInstance)*

noc1

This could easily be genericized by not hardcoding the class 'category'. E.g. if the form could be configured by a YAML with the following config params:

SubjCategory
RelationIRI
ObjCategory
ExtRelationIRI
ObjExtCategory

(possibly also SubjExt)

As in here:

noc2

A GO example:

SubjCategory = GeneProduct
RelationIRI = enables
ObjCategory = MF
ExtRelationIRI = occurs_in
ObjExtCategory = MateralEntity

The wrinkle here is that we conventionally assert the inverse, so perhaps an additional param for this

kltm commented 9 years ago

Any relation to #48?

cmungall commented 9 years ago

Are "addition templates" the names of the widgets on the left of the graph view?

Yes, there are certainly relationships between the ATs and the forms in the phenote-style view. A few differences

Thus far the ATs have been creation-only. In contrast in the phenote-style view it's possible to select a row and see the values appear in the form, and edit them. Not so say we should never do this in the ATs, but thus far we have had no need as there have been other editing modalities post-creation of objects
The ATs have been more minimal in that they do not have evidence fields. We could consider adding these, but the form view allows adding multiple evidences per object property assertion, and this may clutter the AT view

Nevertheless it would be great to reuse. The AT and the form could be driven by the same yaml

kltm commented 9 years ago

"Addition templates", in that formulation, would be the things on the right, pop-ups, and separate pages/entry points.

For the first point, true enough, but writing or not writing values should be easily done. For the second, there is an AT of sorts in there for adding evidence on individuals and models (annotation template).

I think the big difference from this end for me is that the ATs were more envisioned to work by adding parts to a model (your point one), while the phenote-style is more meant to embody the whole of a model, which can make a fair amount of difference in some of the plumbing.

cmungall commented 9 years ago

Let's do whiteboard

cmungall commented 8 years ago

cc @DoctorBud

mbrush commented 8 years ago

Regarding the structures built for G2P associations, (genotype/variant to phenotype/disease), it seems that a key difference here is that the subject that a user specifies is a variant or genotype individual - not a type/class where the system generates the individual IRI automatically. For example, in DIPper models alleles such as shha (http://zfin.org/action/feature/view/ZDB-ALT-980203-1091) are represented as instances of the 'variant allele' (GENO:0000002) class, and it is these instances that would be selected directly as the subject of a G2P association in WebPhenote. Is this in accordance with current plans here?

A possible example:

SubjIndividual = genotype or genotype part instance (from Scigraph?)
SubjCategory = corresponding genotype or genotype part class form GENO or SO (can be automatically generated from selection of SubjIndividual)
RelationIRI = has_phenotype - RO:0002200 (or causes_condition - RO:0003303?)
ObjCategory = Condition (phenotype or disease class)
ExtRelationIRI and ObjExtCategory = there may be several possible extension relevant for G2P (e.g. developmental stage, environment, qualifiers for molecular phenotypes, . . . )

DoctorBud commented 8 years ago

The current WebPhenote only implements a simple disease-phenotype-onset (DPO) relationship. This ticket is about generalizing that, and I have a question about encoding some of the HPO fields not currently present in WebPhenote. After talking to @pnrobinson , I'm looking at some of the HPO annotation data (e.g., https://github.com/monarch-initiative/hpo-annotation-data/blob/75d3e390f795a022c106cfa24cd27e669e371aba/rare-diseases/annotated/OMIM-614455.tab) and am wondering how all of the extra data should be associated.

Looking at the columns in the above file, and marking those that webphenote already handles with a DPO:

(DPO) Disease ID
(DPO) Disease Name
Gene ID
Gene Name
Genotype
Gene Symbol(s)
(DPO) Phenotype ID
(DPO) Phenotype Name
(DPO) Age of Onset ID
(DPO) Age of Onset Name
(DPO) Evidence ID
(DPO) Evidence Name
Frequency
Sex ID
Sex Name
Negation ID
Negation Name
(DPO) Description
(DPO) Pub
(DPO) Assigned by
(DPO) Date Created

and then eliminating the (DPO) entries leaves us with:

Gene ID
Gene Name
Genotype
Gene Symbol(s)
Frequency
Sex ID
Sex Name
Negation ID
Negation Name

Should these additional fields be encoded as SubjExt or ObjExt? I don't think @pnrobinson has access to this repo because his name doesn't autocomplete when I start typing.

cmungall commented 8 years ago

I believe many of these are not required. Can you cut -fN on OMIM-*tab in the repo above and see which are used?

DoctorBud commented 8 years ago

I wrote the following script to analyze all the *.tab files in a directory and applied it to rare-diseases/annotated/ and it showed me the following results, indicating all columns are used:

~/MI/hpo-annotation-data/rare-diseases/annotated master$ ./test.sh
   83854 /tmp/col_DiseaseID.txt
   83854 /tmp/col_DiseaseName.txt
    5566 /tmp/col_GeneID.txt
    5566 /tmp/col_GeneName.txt
    2503 /tmp/col_Genotype.txt
    9354 /tmp/col_GeneSymbols.txt
   83854 /tmp/col_PhenotypeID.txt
   83842 /tmp/col_PhenotypeName.txt
     481 /tmp/col_AgeofOnsetID.txt
     481 /tmp/col_AgeofOnsetName.txt
   83677 /tmp/col_EvidenceID.txt
   83648 /tmp/col_EvidenceName.txt
    6419 /tmp/col_Frequency.txt
      69 /tmp/col_SexID.txt
      69 /tmp/col_SexName.txt
     725 /tmp/col_NegationID.txt
     770 /tmp/col_NegationName.txt
   25446 /tmp/col_Description.txt
   79307 /tmp/col_Pub.txt
   79473 /tmp/col_Assignedby.txt
   83733 /tmp/col_DateCreated.txt

Here is the script:

#!/bin/bash

cols=( \
    "Disease ID" \
    "Disease Name" \
    "Gene ID" \
    "Gene Name" \
    "Genotype" \
    "Gene Symbol(s)" \
    "Phenotype ID" \
    "Phenotype Name" \
    "Age of Onset ID" \
    "Age of Onset Name" \
    "Evidence ID" \
    "Evidence Name" \
    "Frequency" \
    "Sex ID" \
    "Sex Name" \
    "Negation ID" \
    "Negation Name" \
    "Description" \
    "Pub" \
    "Assigned by" \
    "Date Created")

for i in `seq 0 20`
do
    col=$((i+1))
    name=`echo "${cols[$i]}" | tr -d " ()"`
    fname=/tmp/col_$i_$name.txt
    tail -q -n+2 *.tab | cut -f$col | awk NF > $fname
    wc -l $fname
done

DoctorBud commented 8 years ago

So from the list:

Gene ID
Gene Name
Genotype
Gene Symbol(s)
Frequency
Sex ID
Sex Name
Negation ID
Negation Name

Advice needed below

0) Assume we will be supporting an HPO configuration that is a variant of the current DPO (DiseasePhenotypeOnset) configuration.

1) I think we can assume the Gene{ID | Name | Symbol(s)} can be replaced with a Gene Autocomplete field (Noctua probably uses this all the time).

2) Does Genotype need to be entered separately? Or can the Gene Autocomplete provide this?

3) Sex ID/Name can be replaced with an autocomplete or a selector from a finite list (ideally, keyboard navigable)

4) Negation ID/Name can be handled similarly to (3) above

So basically, 3 new columns need to be added at the input/display end. If we wanted to export .tsv files similar to the stuff in hpo-annotation-data, then we'd generate the redundant columns Gene{ID | Name | Symbol(s)} from the selected GeneID.

How does this sound, @cmungall @kltm @jmcmurry @pnrobinson?

pnrobinson commented 8 years ago

Hi Dan, thanks! I think we also need an onset modifier. We are also starting to move towards a more expressive use of modifiers in general (i.e., laterality, severity, triggered-by). The GUI does not need to show the Negation or the Sex ID values, as these are obviously redundant given the terms. But in general, yes, this is going in the right direction

cmungall commented 7 years ago

This has been implemented as a workbench and has its own repo: https://github.com/geneontology/simple-annoton-editor/

geneontology / noctua

Create a configurable statically structured association editing form #150

Advice needed below