Closed mercutio22 closed 11 years ago
Hugo; I think that the phase is correct but happy to adjust if the GenomeTools folks think otherwise. The GFF spec specifies the phase as 0,1 or 2:
http://www.sequenceontology.org/gff3.shtml
while codon_start from the GenBank file is 1, 2 or 3:
http://www.ddbj.nig.ac.jp/FT/full_index.html#7.2
so I've made the adjustment from 1 to 0 in the GFF output when converting. Let me know if your interaction with the GenomeTools developers indicate I've missed something in the conversion.
.''. Hugo A. M. Torres : :' :
. ' “Talk is cheap,
- show me the code. ” -- L. Torvalds.
On Mon, Mar 12, 2012 at 3:04 PM, Brad Chapman reply@reply.github.com wrote:
Hugo; I think that the phase is correct but happy to adjust if the GenomeTools folks think otherwise. The GFF spec specifies the phase as 0,1 or 2:
http://www.sequenceontology.org/gff3.shtml
while codon_start from the GenBank file is 1, 2 or 3:
http://www.ddbj.nig.ac.jp/FT/full_index.html#7.2
so I've made the adjustment from 1 to 0 in the GFF output when converting. Let me know if your interaction with the GenomeTools developers indicate I've missed something in the conversion.
Reply to this email directly or view it on GitHub: https://github.com/chapmanb/bcbb/issues/52#issuecomment-4457847
HI Brad, perhaps this might be useful for testing your program: http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online
I tried and the tool pointed for instance is that the produced gff3 file file has a "source" field. IIRC Peter Cock in one his blog posts says genbank has those but GFF3 does not.
Here, I paste you a sample report:
http://song.cvs.sourceforge.net/*checkout*/song/ontology/so.obo
###############################################################################
###############################################################################
###############################################################################
###############################################################################
#
[line 1]> ##gff-version 3
[line 2]> ##sequence-region NG_017013.1 1 26144
[line 3]> NG_017013.1 annotation remark 1 26144 .
[line 3]> . . comment=REVIEWED%20REFSEQ%3A%20This%20record%20has%20been%20curated%20by%20NCBI%20staff%20in%0Acollaboration%20with%20Graham%20Taylor.%20The%20reference%20sequence%20was%0Aderived%20from%20AC087388.9%20and%20AC007421.13.%0AThis%20sequence%20is%20a%20reference%20standard%20in%20the%20RefSeqGene%20project.%0APublication%20Note%3A%20%20This%20RefSeq%20record%20includes%20a%20subset%20of%20the%0Apublications%20that%20are%20available%20for%20this%20gene.%20Please%20see%20the%20Gene%0Arecord%20to%20access%20additional%20publications.%0ASummary%3A%20This%20gene%20encodes%20tumor%20protein%20p53%2C%20which%20responds%20to%0Adiverse%20cellular%20stresses%20to%20regulate%20target%20genes%20that%20induce%20cell%0Acycle%20arrest%2C%20apoptosis%2C%20senescence%2C%20DNA%20repair%2C%20or%20changes%20in%0Ametabolism.%20p53%20protein%20is%20expressed%20at%20low%20level%20in%20normal%20cells%0Aand%20at%20a%20high%20level%20in%20a%20variety%20of%20transformed%20cell%20lines%2C%20where%0Ait%27s%20believed%20to%20contribute%20to%20transformation%20and%20malignancy.%20p53%0Ais%20a%20DNA-binding%20protein%20containing%20transcription%20activation%2C%0ADNA-binding%2C%20and%20oligomerization%20domains.%20It%20is%20postulated%20to%20bind%0Ato%20a%20p53-binding%20site%20and%20activate%20expression%20of%20downstream%20genes%0Athat%20inhibit%20growth%20and/or%20invasion%2C%20and%20thus%20function%20as%20a%20tumor%0Asuppressor.%20Mutants%20of%20p53%20that%20frequently%20occur%20in%20a%20number%20of%0Adifferent%20human%20cancers%20fail%20to%20bind%20the%20consensus%20DNA%20binding%0Asite%2C%20and%20hence%20cause%20the%20loss%20of%20tumor%20suppressor%20activity.%0AAlterations%20of%20this%20gene%20occur%20not%20only%20as%20somatic%20mutations%20in%0Ahuman%20malignancies%2C%20but%20also%20as%20germline%20mutations%20in%20some%0Acancer-prone%20families%20with%20Li-Fraumeni%20syndrome.%20Multiple%20p53%0Avariants%20due%20to%20alternative%20promoters%20and%20multiple%20alternative%0Asplicing%20have%20been%20found.%20These%20variants%20encode%20distinct%20isoforms%2C%0Awhich%20can%20regulate%20p53%20transcriptional%20activity.%20%5Bprovided%20by%0ARefSeq%2C%20Jul%202008%5D.;
[line 3]> sequence_version=1;source=Homo%20sapiens%20%28human%29;
[line 3]> taxonomy=Eukaryota,Metazoa,Chordata,
[line 3]> Craniata,Vertebrata,Euteleostomi,
[line 3]> Mammalia,Eutheria,Euarchontoglires,
[line 3]> Primates,Haplorrhini,Catarrhini,
[line 3]> Hominidae,Homo;keywords=RefSeqGene;
[line 3]> references=location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Marcel%2CV.%2C%20Tran%2CP.L.%2C%20Sagne%2CC.%2C%20Martel-Planche%2CG.%2C%20Vaslin%2CL.%2C%20Teulade-Fichou%2CM.P.%2C%20Hall%2CJ.%2C%20Mergny%2CJ.L.%2C%20Hainaut%2CP.%20and%20Van%20Dyck%2CE.%0Atitle%3A%20G-quadruplex%20structures%20in%20TP53%20intron%203%3A%20role%20in%20alternative%20splicing%20and%20in%20production%20of%20p53%20mRNA%20isoforms%0Ajournal%3A%20Carcinogenesis%2032%20%283%29%2C%20271-278%20%282011%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%2021112961%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Naidu%2CS.R.%2C%20Love%2CI.M.%2C%20Imbalzano%2CA.N.%2C%20Grossman%2CS.R.%20and%20Androphy%2CE.J.%0Atitle%3A%20The%20SWI/SNF%20chromatin%20remodeling%20subunit%20BRG1%20is%20a%20critical%20regulator%20of%20p53%20necessary%20for%20proliferation%20of%20malignant%20cells%0Ajournal%3A%20Oncogene%2028%20%2827%29%2C%202492-2501%20%282009%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%2019448667%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Anczukow%2CO.%2C%20Ware%2CM.D.%2C%20Buisson%2CM.%2C%20Zetoune%2CA.B.%2C%20Stoppa-Lyonnet%2CD.%2C%20Sinilnikova%2CO.M.%20and%20Mazoyer%2CS.%0Atitle%3A%20Does%20the%20nonsense-mediated%20mRNA%20decay%20mechanism%20prevent%20the%20synthesis%20of%20truncated%20BRCA1%2C%20CHK2%2C%20and%20p53%20proteins%3F%0Ajournal%3A%20Hum.%20Mutat.%2029%20%281%29%2C%2065-73%20%282008%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%2017694537%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Bourdon%2CJ.C.%0Atitle%3A%20p53%20Family%20isoforms%0Ajournal%3A%20Curr%20Pharm%20Biotechnol%208%20%286%29%2C%20332-336%20%282007%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%2018289041%0Acomment%3A%20Review%20article,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Murray-Zmijewski%2CF.%2C%20Lane%2CD.P.%20and%20Bourdon%2CJ.C.%0Atitle%3A%20p53/p63/p73%20isoforms%3A%20an%20orchestra%20of%20isoforms%20to%20harmonise%20cell%20differentiation%20and%20response%20to%20stress%0Ajournal%3A%20Cell%20Death%20Differ.%2013%20%286%29%2C%20962-972%20%282006%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%2016601753%0Acomment%3A%20Review%20article,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Flaman%2CJ.M.%2C%20Waridel%2CF.%2C%20Estreicher%2CA.%2C%20Vannier%2CA.%2C%20Limacher%2CJ.M.%2C%20Gilbert%2CD.%2C%20Iggo%2CR.%20and%20Frebourg%2CT.%0Atitle%3A%20The%20human%20tumour%20suppressor%20gene%20p53%20is%20alternatively%20spliced%20in%20normal%20cells%0Ajournal%3A%20Oncogene%2012%20%284%29%2C%20813-818%20%281996%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%208632903%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Lamb%2CP.%20and%20Crawford%2CL.%0Atitle%3A%20Characterization%20of%20the%20human%20p53%20gene%0Ajournal%3A%20Mol.%20Cell.%20Biol.%206%20%285%29%2C%201379-1385%20%281986%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%202946935%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Harlow%2CE.%2C%20Williamson%2CN.M.%2C%20Ralston%2CR.%2C%20Helfman%2CD.M.%20and%20Adams%2CT.E.%0Atitle%3A%20Molecular%20cloning%20and%20in%20vitro%20expression%20of%20a%20cDNA%20clone%20for%20human%20cellular%20tumor%20antigen%20p53%0Ajournal%3A%20Mol.%20Cell.%20Biol.%205%20%287%29%2C%201601-1610%20%281985%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%203894933%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Zakut-Houri%2CR.%2C%20Bienz-Tadmor%2CB.%2C%20Givol%2CD.%20and%20Oren%2CM.%0Atitle%3A%20Human%20p53%20cellular%20tumor%20antigen%3A%20cDNA%20sequence%20and%20expression%20in%20COS%20cells%0Ajournal%3A%20EMBO%20J.%204%20%285%29%2C%201251-1255%20%281985%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%204006916%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Matlashewski%2CG.%2C%20Lamb%2CP.%2C%20Pim%2CD.%2C%20Peacock%2CJ.%2C%20Crawford%2CL.%20and%20Benchimol%2CS.%0Atitle%3A%20Isolation%20and%20characterization%20of%20a%20human%20p53%20cDNA%20clone%3A%20expression%20of%20the%20human%20p53%20gene%0Ajournal%3A%20EMBO%20J.%203%20%2813%29%2C%203257-3262%20%281984%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%206396087%0Acomment%3A;
[line 3]> accessions=NG_017013;data_file_division=PRI;
[line 3]> date=19-FEB-2012;organism=Homo%20sapiens;
[line 3]> gi=293651587
[line 4]> NG_017013.1 feature source 1 26144 . + .
[line 4]> db_xref=taxon%3A9606;mol_type=genomic%20DNA;
[line 4]> organism=Homo%20sapiens;chromosome=17;
[line 4]> map=17p13.1
[line 5]> NG_017013.1 feature gene 1 6475 . - .
[line 5]> note=WD%20repeat%20containing%2C%20antisense%20to%20TP53;
[line 5]> db_xref=GeneID%3A55135,HGNC%3A25522,
[line 5]> MIM%3A612661;gene=WRAP53;gene_synonym=DKCB3%3B%20TCAB1%3B%20WDR79
[line 6]> NG_017013.1 feature mRNA 2845 6475 . - .
[line 6]> db_xref=GI%3A221136857,GeneID%3A55135,
[line 6]> HGNC%3A25522,MIM%3A612661;product=WD%20repeat%20containing%2C%20antisense%20to%20TP53%2C%20transcript%20variant%202;
[line 6]> transcript_id=NM_001143990.1;inference=similar%20to%20RNA%20sequence%2C%20mRNA%20%28same%20species%29%3ARefSeq%3ANM_001143990.1;
[line 6]> exception=annotated%20by%20transcript%20or%20proteomic%20data;
[line 6]> gene=WRAP53;gene_synonym=DKCB3%3B%20TCAB1%3B%20WDR79;
[line 6]> ID=NM_001143990.1
[line 7]> NG_017013.1 feature mRNA 2845 2956 . - .
[line 7]> Parent=NM_001143990.1
[line 8]> NG_017013.1 feature mRNA 3224 3322 . - .
[line 8]> Parent=NM_001143990.1
[line 9]> NG_017013.1 feature mRNA 3467 3898 . - .
[line 9]> Parent=NM_001143990.1
[line 10]> NG_017013.1 feature mRNA 6322 6475 . - .
[line 10]> Parent=NM_001143990.1
Line Number Error/Warning
4 [ERROR] invalid type (type: source) 7 [ERROR] invalid type pair - check all parents (at line 6; mRNA to mRNA) 12 [ERROR] invalid type pair - check all parents (at line 11; mRNA to mRNA) 17 [ERROR] invalid type pair - check all parents (at line 16; mRNA to mRNA) 22 [ERROR] invalid type pair - check all parents (at line 21; mRNA to mRNA) 26 [ERROR] invalid type pair - check all parents (at line 25; CDS to CDS) 30 [ERROR] invalid type pair - check all parents (at line 29; CDS to CDS) 34 [ERROR] invalid type pair - check all parents (at line 33; CDS to CDS) 38 [ERROR] invalid type pair - check all parents (at line 37; CDS to CDS) 44 [ERROR] invalid type pair - check all parents (at line 43; mRNA to mRNA) 56 [ERROR] invalid type pair - check all parents (at line 55; mRNA to mRNA) 69 [ERROR] invalid type pair - check all parents (at line 68; mRNA to mRNA) 82 [ERROR] invalid type pair - check all parents (at line 81; mRNA to mRNA) 94 [ERROR] invalid type pair - check all parents (at line 93; mRNA to mRNA) 113 [ERROR] invalid type pair - check all parents (at line 112; CDS to CDS) 124 [ERROR] invalid type pair - check all parents (at line 123; CDS to CDS) 135 [ERROR] invalid type pair - check all parents (at line 134; CDS to CDS) 145 [ERROR] invalid type pair - check all parents (at line 144; CDS to CDS) 162 [ERROR] invalid type pair - check all parents (at line 161; CDS to CDS) 171 [ERROR] invalid type pair - check all parents (at line 170; mRNA to mRNA) 180 [ERROR] invalid type pair - check all parents (at line 179; mRNA to mRNA) 189 [ERROR] invalid type pair - check all parents (at line 188; mRNA to mRNA) 206 [ERROR] invalid type pair - check all parents (at line 205; CDS to CDS) 214 [ERROR] invalid type pair - check all parents (at line 213; CDS to CDS) 221 [ERROR] invalid type pair - check all parents (at line 220; CDS to CDS)
.''. Hugo A. M. Torres : :' :
. ' “Talk is cheap,
- show me the code. ” -- L. Torvalds.
On Mon, Mar 12, 2012 at 3:50 PM, A M Torres, Hugo mnemonico@posthocergopropterhoc.net wrote:
Thanks Brad. I will contact them and will let you know asap.
.''
. Hugo A. M. Torres : :' :
.' “Talk is cheap,
- show me the code. ” -- L. Torvalds.On Mon, Mar 12, 2012 at 3:04 PM, Brad Chapman reply@reply.github.com wrote:
Hugo; I think that the phase is correct but happy to adjust if the GenomeTools folks think otherwise. The GFF spec specifies the phase as 0,1 or 2:
http://www.sequenceontology.org/gff3.shtml
while codon_start from the GenBank file is 1, 2 or 3:
http://www.ddbj.nig.ac.jp/FT/full_index.html#7.2
so I've made the adjustment from 1 to 0 in the GFF output when converting. Let me know if your interaction with the GenomeTools developers indicate I've missed something in the conversion.
Reply to this email directly or view it on GitHub: https://github.com/chapmanb/bcbb/issues/52#issuecomment-4457847
Hugo; Thanks for this. The validator is complaining about 'source' not being present in the Sequence Ontology. Mapping GenBank to SO is a fairly large problem. I tried to tackle this a few years back but it ended up being too much work. Here's the progress I made:
http://bcbio.wordpress.com/2008/12/14/standard-ontologies-in-biosql/
Practically, most tools will not enforce this requirement, so being unable to map the entire thing I took the approach of keeping the output GFF similar to the input GenBank. If you wanted to take on a mapping of GenBank to Sequence Ontology I'd be happy to incorporate in.
Is GenomeTools requiring the ontology matches, or just that online validator?
Hi Brad,
Is GenomeTools requiring the ontology matches, or just that online validator?
Hmm, It seems only the validator. GenomeTools seems only to be complaining about that "phase" field.
I have already posted your considerations on their issue tracker. I will let you know what they say when I get a reply. In any case, thanks for taking the time you spent on looking at my problem.
Thanks Hugo -- let me know if there ends up being anything I can change on my end to improve the phase information. Hopefully that'll do it and get things working smoothly with GenomeTools. Thanks for your patience with this.
Hugo; I'm going to close this to clean up the issues. Hopefully everything was solved on the GenomeTools side. Thanks
Hi Brad, AnnotationSketch is complaining about the parsed file again:
GenomeTools error: CDS feature on line 27 in file "../../mirna-django/src/scripts/tp53.gff3" has the wrong phase 0 (should be 1)
I don't know if the problem is with their GFF3 parser though. Can you tell me what you think?
http://paste.debian.net/159462/