WormBase / genedesc_generator

Automated gene descriptions generator for model organism databases
Other
1 stars 0 forks source link

Tissue expression module: how to include data #20

Closed rankishore closed 5 years ago

rankishore commented 6 years ago

Source files: Annotations: ftp://ftp.ebi.ac.uk/pub/databases/wormbase/releases/WS267/ONTOLOGY/anatomy_association.WS267.wb Ontology: ftp://ftp.ebi.ac.uk/pub/databases/wormbase/releases/WS267/

Data rule

  1. Use only those annotation rows with no qualifier or qualifiers 'Certain' and 'Partial'
  2. Ignore all data with the qualifier 'Uncertain' 'Enriched' and 'NOT' Note: Those rows with qualifier 'Enriched' are Expression Cluster data from large scale data sets, will be including these separately only for information poor genes. Do not include here.
  3. Need to try with and without the Murray paper, WBPaper00040986 to see how the results look

Apply set covering trimming. Threshold = ??

Templates

  1. is expressed in/in the , and (Use 'in' or 'in the' depending on which terms come first).
  2. is expressed widely (see replacement rules below).

Granularity rule When a string of anatomy terms are present, check to see if any parent terms are present, keep the highest common parent term available.

Template Special Cases

  1. If the anatomy term 'cell' occurs by itself, then use the words 'widely expressed' instead of 'cell'. sentence: col-178 is expressed in the Cell; Becomes: col-178 is expressed widely.

  2. If the anatomy term 'cell' is present with other anatomy terms, use instead the words 'expressed in several tissues including the' Sentence: frm-1 is expressed in the intestine, pharynx, and the Cell; Becomes: frm-1 is expressed in several tissues including the intestine and pharynx. Sentence: is expressed in several tissues and in the hermaphrodite. Sentence: is expressed in several tissues and in the male.

Replacements

  1. The anatomy term 'neuron' becomes 'nervous system'. Sentence: ceh-82 is expressed in the neuron; Becomes: ceh-82 is expressed in the nervous system;

  2. If the word 'neuron' occurs as part of an anatomy term, eg, 'head neuron', pluralize it, except for the exceptions: Exceptions: I3 neuron I4 neuron I5 neuron I6 neuron M1 neuron M4 neuron M5 neuron MI neuron Sentence: nhr-194 is expressed in the amphid neuron, ciliated neuron, head neuron, and the sensory neuron; Becomes: nhr-194 is expressed in the amphid neurons, ciliated neurons, head neurons, and the sensory neurons;