Psy-Fer / SquiggleKit

SquiggleKit: A toolkit for manipulating nanopore signal data
MIT License
120 stars 23 forks source link

Job is in D state for a long time #50

Open kaltinel opened 2 years ago

kaltinel commented 2 years ago

Hi, I am trying to plot an individual fast5 file, and therefore installed the requirements to run: python SquigglePlot.py -i ~/data/test.fast5 (version python = 3.7)

However, after the script said 'fast5 file is being looked at', the process has stopped in D state (uninterrupted sleep). I do not know what is expected me to do in this state, as D state is generally associated with I/O.

Can I have some help regarding this please? Thank you.

Psy-Fer commented 2 years ago

Hello, that is strange.

Would you be willing to share 1 fast5 file with me to test it? I have not seen this bug before, so will need to do some troubleshooting.

Thanks James

kaltinel commented 2 years ago

Thank you for your reply. Sure, I was testing on the example fast5 files of ont_fast5_api : https://github.com/nanoporetech/ont_fast5_api/blob/master/test/data/single_reads/fe85b517-62ee-4a33-8767-41cab5d5ab39.fast5

It reads the file and gets stuck on the D state. ppp

I look forward to hearing back from your side. Thanks

Psy-Fer commented 2 years ago

Hello,

Could you do a git pull and then try again, but with the --single flag added?

SquiggleKit was originally designed around single fast5 files, and when multi files came, it created some complexity, because the fast5 scheme was confusing and had no 100% way of detecting the differences, so took a while for the community to build all the checks. My solution was to let the user define it, so multi is default, and single files need the --single flag.

Saying that, there was a bug on a check that was throwing a different error (not this bug), and I fixed that and tested it with the fast5 above.

Let me know if this fixes it for you.

James

kaltinel commented 2 years ago

Thank you very much for your detailed answer, James.

I tried the argument you suggested after a git pull:

python SquigglePlot.py --single -i fe85b517-62ee-4a33-8767-41cab5d5ab39.fast5

And the job still is in D state.. I am using Python 3.7.4, could it be the issue, somehow?

And, how long does it take for you to plot this fast5 file? Is your job always in the R state?

I appreciate your feedback, thanks

Psy-Fer commented 2 years ago

Hmm,

Try it with python2 if you can (it was originally built for that and should work). I'll see if I can troubleshoot why it's failing with python 3.

James

kaltinel commented 2 years ago

I tried it with py2.7 now, and it directly goes into the D state as well..

PS: I do not get the terminal output of fast5 file is being looked at anymore.

I do not know what it could be..

Thank you for the help!

Psy-Fer commented 2 years ago

Okay now that's interesting.

What happens if you just run it with -h as the only argument?

You should be getting errors on stderr if something is going wrong

kaltinel commented 2 years ago

I am getting a pretty healthy help section actually (in py2.7):

>> python SquigglePlot.py --help
usage: SquigglePlot.py [-h] [-p F5_PATH | -s SIGNAL | -i IND [IND ...]]
                       [-r READID] [--single] [--head] [--raw_signal] [-n NUM]
                       [--lim_hi LIM_HI] [--lim_low LIM_LOW]
                       [--plot_colour PLOT_COLOUR] [--save SAVE]
                       [--save_path SAVE_PATH] [--no_show] [--dpi DPI]

SquigglePlot - plotting the raw signal data after (optional) conversion to pA

optional arguments:
  -h, --help            show this help message and exit
  -p F5_PATH, --f5_path F5_PATH
                        Fast5 top dir
  -s SIGNAL, --signal SIGNAL
                        Extracted signal file from SquigglePull. Currently not
                        compatible with conversion
  -i IND [IND ...], --ind IND [IND ...]
                        Individual fast5 file/s
  -r READID, --readID READID
                        Individual readID to extract from a multifast5 file
  --single              single fast5 files.
  --head                Header present in signal or flat file
  --raw_signal          Plot raw signal instead of converting to pA
  -n NUM, --Num NUM     Section of signal to look at - -n 2000 or -n 100,1500
  --lim_hi LIM_HI       Upper limit for signal outliers
  --lim_low LIM_LOW     Lower limit for signal outliers
  --plot_colour PLOT_COLOUR
                        Colour of signal plot, takes any pyplot entry:
                        k,r,b,g,red,blue,etc...
  --save SAVE           Save file readname_saveArg.pdf --save saveArg.pdf, use
                        png, etc for other file types
  --save_path SAVE_PATH
                        Save filepath
  --no_show             Do not show plot (used for saving many)
  --dpi DPI             Change DPI for publication figs, eg: --dpi 300
Psy-Fer commented 2 years ago

Okay, could you try the -i and --single commands like before, but with --save test.png --save_path ./ --no_show

This will check if the back end of matplotlib is working, without the interactive plots.

Let me know if it produces anything.

kaltinel commented 2 years ago

Unfortunately, it directly goes into D state again..

nff