Closed MirrorReaper closed 5 years ago
can you please link in the exact commands that do not seem to work
Getting the genbank file
efetch -db nuccore -id AF086833 -format gb >AF.gb
Getting the gff file from gb bank
cat AF.gb | seqret -filter -feature -osformat gff3 > AF.gff
Getting the fasta format
efetch -db nuccore -id AF086833 -format fasta >AF.fa
Indexing the fasta file
samtools faidx AF.fa
When using the AF.fa as the genome and loading the gff file , the track show nothing
While when getting the fasta format from the genbank file
cat AF.gb | seqret -filter -feature -osformat fasta > AF-2.fa
The reason is as simple as it infuriating, the genome is called AF086833
when you convert with but will be called AF086833.2
when you fetch directly from NCBI. Look inside the data (the first line).
Even though we are operating on the exact same data - it goes to show that conversions may subtly alter the chromosome naming.
In general, 99% of the cases when IGV does not show anything the reason is that the chromosome names do not match for whatever reason (sometimes ridiculous reasons).
Thanks.
It is really frustrating indeed.
Thanks again.
Im having an issue that when I get the fasta file directly for AF086833 via efetch (unlike the book where seqret tool is used to get the fasta from gb file) the gff file does not show any features at all., the track is empty.
While when doing exactly as the book (getting gb format, then using seqret tool to get both fasta and gff formats) all seems good
Why this discrepancy?