Closed childers closed 7 years ago
Thanks, will test it very soon!
On Tue, Mar 21, 2017 at 10:58 AM, childers notifications@github.com wrote:
Terence had some additional examples for us to test with:
For more testing, here are two assemblies with lots of sequences, so the mapping table is big: https://www.ncbi.nlm.nih.gov/assembly/GCF_000715135.1 https://www.ncbi.nlm.nih.gov/assembly/GCF_000233375.1
The program should gracefully fail given an assembly report like this one: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/180/655/ GCA_000180655.1_ASM18065v1/GCA_000180655.1_ASM18065v1_assembly_report.txt As I mentioned, we’re planning to switch to always populating the file so cases like that will go away. It’s also never the case for RefSeq assemblies.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NCBI-Hackathons/Master_gff3_parser/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/AC4N0woA9LLE7irXNiUnEGIulizLw_Ggks5rn-WigaJpZM4Mj7pz .
-- Guilhem Faure, Ph.D Computational Biologist -Evolution from Genomics and Structures- LinkedIn: http://goog_96224789http://www.linkedin.com/in/guilhemfaure
For tobacco, there are no alternative IDs to convert to (not even genbank IDs).
It does work if we convert from refSeq to refSeq:
$ time seqconv convert --ref Ntab-TN90 --out rs ref_Ntab-TN90_top_level.gff3.gz >test_tobacco_gb.gff3
Converting from None to rs
Starting Conversion
FORMAT detected: rs
real 0m16.931s
user 0m14.429s
sys 0m1.302s
Fro Salmon, it appears to work ok:
$ time seqconv convert --ref ICSASG_v2 --out gb ref_ICSASG_v2_top_level.gff3.gz> test_salmon.gff3
Converting from None to gb
Starting Conversion
No corresponding id for nc_001960.1 from rs
FORMAT detected: rs
real 0m50.122s
user 0m37.864s
sys 0m3.102s
text output
$ head -n 20 test_salmon.gff3
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/233/375/GCF_000233375.1_ICSASG_v2/GCF_000233375.1_ICSASG_v2_assembly_report.txt
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build ICSASG_v2
#!genome-build-accession NCBI_Assembly:GCF_000233375.1
#!annotation-date 22 September 2015
#!annotation-source NCBI Salmo salar Annotation Release 100
##sequence-region CM003279.1 1 159038749
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=8030
CM003279.1 RefSeq region 1 159038749 . + . ID=id0;Dbxref=taxon:8030;Name=ssa01;breed=double haploid;chromosome=ssa01;dev-stage=adult;gbkey=Src;genome=chromosome;isolate=Sally;mol_type=genomic DNA;sex=female;tissue-type=muscle
CM003279.1 Gnomon gene 5501 62139 . - . ID=gene0;Dbxref=GeneID:106560212;Name=LOC106560212;gbkey=Gene;gene=LOC106560212;gene_biotype=protein_coding
CM003279.1 Gnomon mRNA 5501 62139 . - . ID=rna0;Parent=gene0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;Name=XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1 Gnomon exon 61647 62139 . - . ID=id1;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1 Gnomon exon 43486 43714 . - . ID=id2;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1 Gnomon exon 23978 24241 . - . ID=id3;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1 Gnomon exon 16966 17019 . - . ID=id4;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1 Gnomon exon 5501 5691 . - . ID=id5;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1 Gnomon CDS 43486 43633 . - 0 ID=cds0;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XP_014016259.1;Name=XP_014016259.1;gbkey=CDS;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;protein_id=XP_014016259.1
CM003279.1 Gnomon CDS 23978 24241 . - 2 ID=cds0;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XP_014016259.1;Name=XP_014016259.1;gbkey=CDS;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;protein_id=XP_014016259.1
Terence had some additional examples for us to test with: