Closed dtdoering closed 8 months ago
the PR auto-closed this but I just released a new version that should include a fix here https://jbrowse.org/jb2/blog/2024/03/06/v2.10.3-release/
thanks again for the detailed bug report, I hope it didn't cause you too much trouble :) if you have any more problems let me know
TL;DR
I suspect that
@gmod/faidx-js
incorrectly calculates theLINEBASES
(per the fai index spec) using the last line of a FASTA record (which is usually shorter than the rest of the preceding lines) rather than the actual/full line length (e.g. 80 is common).Describe the bug
First observation
Description
In a JBrowse Desktop session, the Reference Sequence track does not correspond with the correct sequence (DNA or protein) I expect to see in the genome for a gene of interest, and the correct sequence doesn't appear to be anywhere nearby.
To Reproduce
GCF_000016425.1_ASM1642v1_genomic.fna
)NC_009380.1
)genomic.gff
)STROP_RS05130
gene:NC_009380.1:1,140,687..1,140,755
Observed behavior
TGT GCT GAT ACC CGG
and the corresponding protein sequence begins withCADTR
Expected behavior
The DNA sequence of the feature should start with
GTG GAG GAT CAC CTG
, and its protein sequence should begin with(M or V) EDHL
.See the full NCBI record for the gene's genomic sequence for verification
Alternatively, here is a (nasty-long) link to NCBI's Sequence Viewer navigated to the same coordinates (
1,140,687..1,140,755
):Second observation
Description
The FASTA index file JBrowse Desktop is using lists a
LINEBASES
(4th field) of 51, whereas I believe it should be 80.To Reproduce
At the command line:
Navigate to the JBrowse Desktop app's folder (
/Users/DTDoering/Library/Application Support/@jbrowse/desktop/fai
for me)View the index file:
The 4th field is
51
, whereas runningsamtools faidx
on the FASTA ourselves yields a 4th field of80
:Third observation
(I think) The
@gmod/faidx-js
testing snapshots also list theLINEBASES
of the last line of their corresponding FASTA records --21
forctgA
and79
forctgB
, rather than60
forctgA
and100
forctgB
.Reproduce
At command line:
Compare to the snapshot:
Final thoughts
I believe the incorrect FASTA indexing is the cause of the Reference Sequence track not appearing correctly, and that it was not caught by the existing
faidx-js
tests because they are incorrect as well.Of course, I could be misunderstanding the FAI specification, or perhaps it was updated recently or something, but I wanted to document the unexpected behavior in case I'm not mistaken!
I'll also file an issue in the
GMOD/faidx-js
repo and link it here for tracking.Please let me know if you need anything else to reproduce!
Screenshots
My JBrowse Desktop showing the region:
Version
JBrowse Desktop: Version 2.10.2 (2.10.2)
OS: MacOS Sonoma 14.3.1
Samtools: I have the latest
samtools
from HomeBrew: