WormBase / wormbase-pipeline

Wormbase Build Pipeline
http://www.wormbase.org
22 stars 13 forks source link

Molecule Model Changes - Karen #18

Closed Paul-Davis closed 8 years ago

Paul-Davis commented 9 years ago

Summary and proposed model:

http://wiki.wormbase.org/index.php/Molecule#Changes_for_WS251

As proposed following discussion and edits (Sept 4th)

?Molecule   Name    ?Text //WBMoleculeID
    Public_name ?Text  //Source = CTD, ChEBI, research articles
    Formula ?Text //Source chebi.obo RELATED FORMULA [KEGG COMPOUND:] and RELATED FORMULA [ChEBI:] can be mulitple
    Monoisotopic_mass ?Float//Source?
    IUPAC ?Text //Source chebi.obo EXACT IUPAC_NAME [IUPAC:] can be multiple
    SMILES ?Text//Source chebi.obo RELATED SMILES [ChEBI:], unique
    InChi ?Text//Source chebi.obo RELATED InChI [ChEBI:], unique
        InChiKey ?Text//Source chebi.obo RELATED InChIKey [ChEBI:] unique
    Synonym ?Text //CTD
    DB_info Database  ?Database ?Database_field ?Text  //for kegg/reactome pathways- also perhaps for all chebi.obo tags (may replace chebi source tags above)
    Status  Detected  #Evidence
        Predicted #Evidence
    Origin  Plant     //from HMDB
                Microbial  ?Species
                Cosmetic
                Toxin_Pollutant
                Food_for_human
                Drug
                Exogenous
                Endogenous   ?Species #Evidence
                Drug_metabolite
                Pharmaceutical
        Biofunction //from HMDB  or ChEBI Role ontology - do we want to take in the whole ontology? can we just create the .ace and extract chebi-obo ID public name mappings/
        Metabolite //will be in ChEBI or HMDB ontology
        Regulatory //may be in ChEBI or HMDB ontology
        Structural //may be in ChEBI or HMDB ontology
            Ligand_for Gene_product_receptor ?Gene Receptor_for_molecule #Evidence 
            Molecule_receptor ?Molecule XREF Receptor_for_molecule #Evidence
                Receptor_for_molecule ?Molecule XREF Molecule_receptor
            Regulate_expr_cluster   ?Expression_cluster XREF    Regulated_by_molecule
    Requirement Essential ?Species #Evidence
            Nonessential ?Species #Evidence
    WBProcess   ?WBProcess  XREF    Molecule
    Affects_phenotype_of    Variation   ?Variation  ?Phenotype  #Evidence
        Strain  ?Strain ?Phenotype  #Evidence
        Transgene   ?Transgene  ?Phenotype  #Evidence
        RNAi    ?RNAi   ?Phenotype  #Evidence
        Rearrangement   ?Rearrangement  ?Phenotype  #Evidence
        Interaction ?Interaction    XREF    Molecule_interaction
    Molecule_use    ?Text   #Evidence  //manual extraction from papers - should sync with ChEBI roles??
    Reference   ?Paper  XREF    Molecule
    Remark  ?Text   #Evidence
Paul-Davis commented 9 years ago

Error in model

Proposal:

            Ligand_for Gene_product_receptor ?Gene Receptor_for_molecule #Evidence 

"Receptor_for_molecule" is a duplicate tag within the following XREF pair in the Molecule model:

            Molecule_receptor ?Molecule XREF Receptor_for_molecule #Evidence
            Receptor_for_molecule ?Molecule XREF Molecule_receptor

Presume this is a cut and paste error? Fix:

            Ligand_for Gene_product_receptor ?Gene #Evidence 

Also is the "Ligand_for" supposed to set a hierarchy for some of the following tags, as currently it doesn't

                      Ligand_for Gene_product_receptor ?Gene #Evidence 
                      Molecule_receptor ?Molecule XREF Receptor_for_molecule #Evidence
                      Receptor_for_molecule ?Molecule XREF Molecule_receptor
                      Regulate_expr_cluster ?Expression_cluster XREF Regulated_by_molecule
Paul-Davis commented 9 years ago

Is this the desired structure of the model? also do you need an #Evidence on Molecule_receptor

- = lines in question.

          Biofunction Metabolite // will be in ChEBI or HMDB ontology
                      Regulatory // may be in ChEBI or HMDB ontology
                      Structural // may be in ChEBI or HMDB ontology
-         Ligand_for Gene_product_receptor ?Gene #Evidence 
-                    Molecule_receptor ?Molecule XREF Receptor_for_molecule #Evidence
-                    Receptor_for_molecule ?Molecule XREF Molecule_receptor #Evidence
-         Regulate_expr_cluster ?Expression_cluster XREF Regulated_by_molecule
          Requirement Essential ?Species #Evidence
                      Nonessential ?Species #Evidence
kyook commented 9 years ago

Hi Paul, Here is the model with proper indentations, as far as I can tell. One outstanding question I have is about Taxon versus Species. In many cases, the source of a molecule is general in terms of being from "plants" or "algae" rather than a species. I need something that can move from species to further up the taxonomic ladder.

 ?Molecule   Name    ?Text //WBMoleculeID
    Public_name ?Text  
    Formula ?Text 
    Monoisotopic_mass ?Float
    IUPAC ?Text 
    SMILES ?Text
    InChi ?Text
    InChiKey ?Text
    Synonym ?Text
    DB_info Database  ?Database ?Database_field ?Text
    Status  Detected  #Evidence
            Predicted #Evidence
    Origin  Plant ?Taxon
            Microbial  ?Taxon
            Cosmetic
            Toxin_Pollutant
            Food_for_human
            Drug
            Exogenous
            Endogenous  ?Taxon #Evidence
            Drug_metabolite
            Pharmaceutical
    Biofunction_role      Metabolite 
                             Regulatory 
                             Structural 
    Biofunction_action     Activates_gene_product ?Gene XREF Activated_by_molecule #Evidence
                             Inhibits_gene_product ?Gene XREF Inhibited_by_molecule #Evidence
                             Substrate_for_gene_product ?Gene XREF Molecule_substrate #Evidence
                             Product_of_gene_product ?Gene XREF Molecule_product #Evidence
                             Cofactor_for_gene_product ?Gene XREF Molecule_cofactor #Evidence
                             Substrate_for_molecule ?Molecule XREF Ligand_for_molecule#Evidence
                             Product_of_molecule ?Molecule XREF Precursor_for_molecule #Evidence
                             Ligand_for_molecule ?Molecule XREF Substrate_for_molecule #Evidence
                             Precursor_for_molecule  ?Molecule XREF Product_of_molecule #Evidence
                             Regulate_expr_cluster   ?Expression_cluster XREF  Regulated_by_molecule
    Requirement Essential ?Species #Evidence
                Nonessential ?Species #Evidence
    WBProcess   ?WBProcess  XREF    Molecule
    Affects_phenotype_of    Variation   ?Variation  ?Phenotype  #Evidence
            Strain  ?Strain ?Phenotype  #Evidence
            Transgene   ?Transgene  ?Phenotype  #Evidence
            RNAi    ?RNAi   ?Phenotype  #Evidence
            Rearrangement   ?Rearrangement  ?Phenotype  #Evidence
            Interaction ?Interaction    XREF    Molecule_interaction
    Molecule_use    ?Text   #Evidence  
    Reference   ?Paper  XREF    Molecule
    Remark  ?Text   #Evidence
kyook commented 9 years ago

I just made some more changes to the model above. These changes should be more intuitive and make capturing the data more straightforward. The question remains about what the different Taxon versus Species is and if Taxon can accommodate species level annotations.

kyook commented 9 years ago

Paul, lets move this to WS252, there are too many things - there are still too many questions, and I need Juancarlos to make extensive changes to the postgres side of things, which won't be able to get started on for a bit.

Paul-Davis commented 9 years ago

Reverted models.wrm file back to the previous version with:

# Resets index to former commit; replace '56e05fced' with your commit code
git reset 008830ea34a3795c91b0d2da6d23c14aa28ad384 
# Moves pointer back to previous HEAD
git reset --soft HEAD@{1}
# Paranoia about touching multiple files
git add models.wrm
# New commit to push the rollback
git commit -m "Revert to 008830ea34a3795c91b0d2da6d23c14aa28ad384"
# Discards the working version of the file
git reset --hard 
# Pull in any changes made by others
git pull
# Pushes the changes out to the remote repo
git push

Status: Waiting for final decisions on a couple of tags

khowe commented 9 years ago

The above recipe did not quite work. What we ended up doing is this:

git revert --no-commit 17a9f8fc27dcb06d024a60cae3c75c93f01faab3 wspec/models.wrm git commit -m "Unrolled Molecule changes"

...where the referenced commit is the one that added the molecule changes.

This model change has been deferred to WS252. Changed milestone.

Paul-Davis commented 8 years ago

Issue with Metabolite ?Text #Evidence as majority of Metabolite data will be large scale and therefore no text comment.

Re-open ticket until change populated to models.wrm for WS252