Psy-Fer / SquiggleKit

SquiggleKit: A toolkit for manipulating nanopore signal data
MIT License
120 stars 23 forks source link

Plotting specific readID from slow5 or blow5 file can take a long time #54

Closed waltergallegog closed 2 years ago

waltergallegog commented 2 years ago

Hello, I'm learning how to use slow5 and blow5 formats. I have a fast5 file which I have converted to slow5 and blow5 formats using slow5tools, with a command like: slow5tools f2s Chip137_IVT_NA12878_Data_reads_0.fast5 -o Chip137_IVT_NA12878_Data_reads_0.blow5 -p 8 -a slow5tools index Chip137_IVT_NA12878_Data_reads_0.blow5

readID 001bff3d-2e77-4be8-9e85-46cd7e05b7e0 is the first readID on the files readID fffc9fc5-69ba-4d51-a0ea-4fbaafb86276 is the last readID on the files.

Then I'm using SquigglePlot. This is what I have noticed:

With "a long time" I mean around 5 minutes. During the 5 minutes, the process uses 100% of one of the cpu cores. The files have 4000 reads, and the original fast5 can be downloaded from here

Thanks for your support, Walter.

Psy-Fer commented 2 years ago

Hey,

Yea I think I know what's causing that. Could you please confirm which pyslow5 version you are running just to be sure?

I'll fix this up in squiggle plot.

James

waltergallegog commented 2 years ago

I'm running pyslow5 version 0.3.0 Walter

Psy-Fer commented 2 years ago

Hey Walter,

I've pushed an update to SquigglePlot. You should see a difference in performance now when using slow5.

I was a bit lazy in my initial integration of slow5 and was meant to come back to it. So thanks for making me fix that up.

You should be able to run the same commands now and see a vast improvement.

Cheers, James

waltergallegog commented 2 years ago

Hey James, Thanks for the quick fix, I have just tested and as you said, there is a vast improvement. The behavior with slow5 files is now similar to what I observed with .fast5 files. Regards, Walter