biostars / biostar-handbook

Issue tracker for the Biostar Handbook
57 stars 12 forks source link

Chapter VI visualizing example issue AF086833 #90

Closed MirrorReaper closed 5 years ago

MirrorReaper commented 5 years ago

Im having an issue that when I get the fasta file directly for AF086833 via efetch (unlike the book where seqret tool is used to get the fasta from gb file) the gff file does not show any features at all., the track is empty.

While when doing exactly as the book (getting gb format, then using seqret tool to get both fasta and gff formats) all seems good

Why this discrepancy?

ialbert commented 5 years ago

can you please link in the exact commands that do not seem to work

MirrorReaper commented 5 years ago

Getting the genbank file efetch -db nuccore -id AF086833 -format gb >AF.gb

Getting the gff file from gb bank cat AF.gb | seqret -filter -feature -osformat gff3 > AF.gff

Getting the fasta format efetch -db nuccore -id AF086833 -format fasta >AF.fa

Indexing the fasta file samtools faidx AF.fa

When using the AF.fa as the genome and loading the gff file , the track show nothing

2019-10-16

While when getting the fasta format from the genbank file

cat AF.gb | seqret -filter -feature -osformat fasta > AF-2.fa

2019-10-16 (1)

ialbert commented 5 years ago

The reason is as simple as it infuriating, the genome is called AF086833 when you convert with but will be called AF086833.2 when you fetch directly from NCBI. Look inside the data (the first line).

Even though we are operating on the exact same data - it goes to show that conversions may subtly alter the chromosome naming.

In general, 99% of the cases when IGV does not show anything the reason is that the chromosome names do not match for whatever reason (sometimes ridiculous reasons).

MirrorReaper commented 5 years ago

Thanks.

It is really frustrating indeed.

Thanks again.