Closed Shians closed 3 years ago
Thanks for your interest. You're correct that in the general case you'd need to be sorted in chromosome order for tabix to work, but in this case we're looking at events (insertions) that can only span a segment of a single chromosome so this sort is sufficient.
I am working on a package to visualise differential methylation results, and I came across this repo because of its use of tabix. I don't do this type of analysis so I haven't run the code to determine if it's a true issue. Here you are sorting with
-k3,3n
, but this simply sorts the nanopolish output by the starting position, if the data contained multiple chromosomes then this is not sufficient to satisfy tabix indexing which requires that all chromosomes be grouped. e.g.is sorted by beginning but not by chromosome and attempting to run
tabix -f -S 1 -s 1 -b 3 -e 4
should raise error[E::hts_idx_push] Chromosome blocks not continuous
. You can fix this by usingsort -k1,1V -k3,3n
which will first sort by chromosome then by starting position. If the pipeline actually runs in a way that nanopolish output only ever contains 1 chromosome then this is irrelevant.