1. For those elegans genes that have Orthology to human and no GO data:
Write a sentence with that human ortholog's GO Molecular Function (MF) information:
-pick the best ortholog by Alliance stringency and best filters
first preference: ortholog with experimental annotations (if it also has predicted MF terms, ignore)
If more than one best ortholog pick the ortholog with the most MF experimental terms
if no orthologs with experimental annotations pick the ortholog with predicted MF annotations)
If none of the orthologs have experimental terms go to the one with the most predicted MF terms, if this is tied, pick alphabetical.
Templates:
human < gene symbol > exhibits < MF term >
human < gene symbol > is predicted to have < MF term >
human < gene symbol > is a/an < MF term >
human < gene symbol > is predicted to be a/an < MF term >
(Follow all rules for MF terms in general)
2. For those genes with no GO data (either for the focus gene or the human ortholog GO MF ) (may or may not have orthology and/or tissue expression data)
Include a sentence describing protein domains using the 'protein domains' file at
-/pub/databases/wormbase/releases/WS266/species/c_elegans/PRJNA13758/annotation/c_elegans.PRJNA13758.WS266.protein_domians.csv.gz
Place this sentence before tissue expression data
Templates:
If only one domain:
Predicted to encode a protein with the following domain: < protein domain1 >;
If more than one domain:
Predicted to encode a protein with the following domains: < protein domain1 > and < protein domain2 >;
Store the INTERPRO IDs to put in the reference.
3. For those genes with no Orthology, GO and tissue expression data for the focus gene
--Add expression cluster data, with all three being added if it exists for a gene (anatomy, gene regulation and chemical regulation)
--Add protein domain data
1. For those elegans genes that have Orthology to human and no GO data: Write a sentence with that human ortholog's GO Molecular Function (MF) information: -pick the best ortholog by Alliance stringency and best filters
Templates: human < gene symbol > exhibits < MF term > human < gene symbol > is predicted to have < MF term > human < gene symbol > is a/an < MF term > human < gene symbol > is predicted to be a/an < MF term >
(Follow all rules for MF terms in general)
2. For those genes with no GO data (either for the focus gene or the human ortholog GO MF ) (may or may not have orthology and/or tissue expression data)
Templates: If only one domain: Predicted to encode a protein with the following domain: < protein domain1 >;
If more than one domain: Predicted to encode a protein with the following domains: < protein domain1 > and < protein domain2 >;
Store the INTERPRO IDs to put in the reference.
3. For those genes with no Orthology, GO and tissue expression data for the focus gene --Add expression cluster data, with all three being added if it exists for a gene (anatomy, gene regulation and chemical regulation) --Add protein domain data