Psy-Fer / SquiggleKit

SquiggleKit: A toolkit for manipulating nanopore signal data
MIT License
120 stars 23 forks source link

Error plotting with SquigglePlot.py #51

Open capoony opened 2 years ago

capoony commented 2 years ago

Hi,

unfortunately, I cannot get SquiggleKit to work. I closely followed your Installation tutorial and installed the required packages in a virtual environment.

cd /opt/bioinformatics

git clone https://github.com/Psy-Fer/SquiggleKit.git

python3 -m venv /opt/venv/SquiggleKit

source /opt/venv/SquiggleKit/bin/activate

pip install ont-fast5-api

pip install numpy h5py sklearn matplotlib

pip install pyslow5

When, I now run SquigglePlot.py on the test dataset in example/ I get the following error.

(base) [mkapun@nhm-phylo2 ~]$ source /opt/venv/SquiggleKit/bin/activate
(SquiggleKit) (base) [mkapun@nhm-phylo2 ~]$ cd SquiggleKit/
(SquiggleKit) (base) [mkapun@nhm-phylo2 SquiggleKit]$ python ./SquigglePlot.py -i example/test.fast5
Looking at the file example/test.fast5
Traceback (most recent call last):
  File "/opt/bioinformatics/SquiggleKit/SquigglePlot.py", line 364, in read_multi_fast5

    readID = hdf[read]['Raw'].attrs['read_id'].decode()
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/opt/venv/SquiggleKit/lib/python3.8/site-packages/h5py/_hl/group.py", line 305, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'Raw' doesn't exist)"
extract_fast5():failed to read readID: AnalysesTraceback (most recent call last):
  File "/opt/bioinformatics/SquiggleKit/SquigglePlot.py", line 364, in read_multi_fast5
    readID = hdf[read]['Raw'].attrs['read_id'].decode()
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/opt/venv/SquiggleKit/lib/python3.8/site-packages/h5py/_hl/group.py", line 305, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'Raw' doesn't exist)"
extract_fast5():failed to read readID: RawTraceback (most recent call last):
  File "/opt/bioinformatics/SquiggleKit/SquigglePlot.py", line 364, in read_multi_fast5
    readID = hdf[read]['Raw'].attrs['read_id'].decode()
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/opt/venv/SquiggleKit/lib/python3.8/site-packages/h5py/_hl/group.py", line 305, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'Raw' doesn't exist)"
extract_fast5():failed to read readID: UniqueGlobalKeyTraceback (most recent call last):
  File "/opt/bioinformatics/SquiggleKit/SquigglePlot.py", line 459, in <module>
    main()
  File "/opt/bioinformatics/SquiggleKit/SquigglePlot.py", line 275, in main
    sigs = get_multi_fast5_signal(args, fast5)
  File "/opt/bioinformatics/SquiggleKit/SquigglePlot.py", line 344, in get_multi_fast5_signal
    signal = convert_to_pA_numpy(signal, f5_dic[read]['digitisation'], f5_dic[read]['range'], f5_dic[read]['offset'])
  File "/opt/bioinformatics/SquiggleKit/SquigglePlot.py", line 455, in convert_to_pA_numpy
    raw_unit = range / digitisation
ZeroDivisionError: float division by zero

Any help would be highly appreciated

Psy-Fer commented 2 years ago

Hey,

Sorry for the delay. Have you tried on a different fast5 file? SquiggleKit has been going through a bit of a messy update (which I plan on fixing soon). So the basic test.fast5 might not work with everything.

Could you please test on a "real" file and let me know? I'll look at updating the example dataset

James

capoony commented 2 years ago

Hi James,

Many tthanks for your swift reply! I tried to run the same command on a real dataset based on a subset of randomly drawn 800 or so FAST5 files. Unfortunatly, no success either. There are more errors popping up:

(base) [mkapun@nhm-phylo2 ~]$ source /opt/venv/SquiggleKit/bin/activate
(SquiggleKit) (base) [mkapun@nhm-phylo2 ~]$ 
(SquiggleKit) (base) [mkapun@nhm-phylo2 ~]$ cd /opt/bioinformatics/SquiggleKit
(SquiggleKit) (base) [mkapun@nhm-phylo2 SquiggleKit]$ 
(SquiggleKit) (base) [mkapun@nhm-phylo2 SquiggleKit]$ python3.6 ./SquigglePlot.py -i /media/inter/mkapun/projects/MinION_TestRuns/Basecalling/FAST5/1k_pass0.fast5
Looking at the file /media/inter/mkapun/projects/MinION_TestRuns/Basecalling/FAST5/1k_pass0.fast5
Traceback (most recent call last):
  File "./SquigglePlot.py", line 375, in read_multi_fast5
    for col in hdf[read]['Raw/Signal'][()]:
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/opt/venv/SquiggleKit/lib64/python3.6/site-packages/h5py/_hl/dataset.py", line 787, in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 192, in h5py.h5d.DatasetID.read
  File "h5py/_proxy.pyx", line 112, in h5py._proxy.dset_rw
OSError: Can't read data (can't open directory: /usr/local/hdf5/lib/plugin)
extract_fast5():failed to read readID: read_00e39647-9862-4526-af34-b6f5468905afTraceback (most recent call last):
  File "./SquigglePlot.py", line 375, in read_multi_fast5
    for col in hdf[read]['Raw/Signal'][()]:
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/opt/venv/SquiggleKit/lib64/python3.6/site-packages/h5py/_hl/dataset.py", line 787, in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 192, in h5py.h5d.DatasetID.read
  File "h5py/_proxy.pyx", line 112, in h5py._proxy.dset_rw
OSError: Can't read data (can't open directory: /usr/local/hdf5/lib/plugin)
extract_fast5():failed to read readID: read_0135a498-a85d-48bd-9304-e7fe84494b5aTraceback (most recent call last):
  File "./SquigglePlot.py", line 375, in read_multi_fast5
    for col in hdf[read]['Raw/Signal'][()]:
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/opt/venv/SquiggleKit/lib64/python3.6/site-packages/h5py/_hl/dataset.py", line 787, in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 192, in h5py.h5d.DatasetID.read
  File "h5py/_proxy.pyx", line 112, in h5py._proxy.dset_rw

The above is only the header, the error messages continue for the remaining reads. In addition at the very end there is another error:

./SquigglePlot.py:450: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  plt.show()
./SquigglePlot.py:450: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  plt.show()
./SquigglePlot.py:450: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  plt.show()

Thanks a lot for your help,

Martin

Psy-Fer commented 2 years ago

Hmm that looks like 2 things. H5py isn't able to open the file being provided, and then it looks like Tk isn't available for matplotlib (the various back ends for matplotlib to display things is out of my control and is more a user environment facing thing based on operating system)

Which OS are you using?

Also to limit issues. Please move one of your fast5 files to a directory on its own, then replace the -i argument with the -p argument and just provide the path to the folder

capoony commented 2 years ago

Hi James,

On 10 Nov 2021, at 09:22, James Ferguson @.***> wrote:

Hmm that looks like 2 things. H5py isn't able to open the file being provided, and then it looks like Tk isn't available for matplotlib (the various back ends for matplotlib to display things is out of my control and is more a user environment facing thing based on operating system)

Which OS are you using?

I am using AlmaLinux which is a CentOS clone.

Also to limit issues. Please move one of your fast5 files to a directory on its own, then replace the -i argument with the -p argument and just provide the path to the folder

OK, thanks for the tip. I already tried this as well, but resulted in the same error.

Thanks, Martin

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Psy-Fer/SquiggleKit/issues/51#issuecomment-964886604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7PNU3MFQDUGLWMJW5EAZLULITTTANCNFSM5HSUY4QQ.

Psy-Fer commented 2 years ago

Hmm okay,

Give this a try if you have sudo to install tkinter

https://stackoverflow.com/a/42244548

capoony commented 2 years ago

unfortunately, that did not help either since tkinter was already installed.

Psy-Fer commented 2 years ago

So that error is looking like it can't get around using agg for plt.show()

So let's get around that and save it instead and skip the live viewer.

python3.6 ./SquigglePlot.py -i /media/inter/mkapun/projects/MinION_TestRuns/Basecalling/FAST5/1k_pass0.fast5 --no_show --save test.png --save_path ./test

That would take care of the plotting issue.

As for the hdf5 issue, that is a complicated one. It looks like your python isn't able to access the plugin (or something) to then open the file properly, or it's throwing this error because the fast5 file isn't working with the method I'm using to read it.

First thing that comes to mind is the fast5 file might have vbz compression, which I have not gotten around to testing with SquiggleKit and how to make that compatible with various operating systems. Any chance you could send me 1 of your fast5 files so I can have a go myself and make any changes needed to the code?

Sorry for the slow reply. Last week was a busy one.

James

Psy-Fer commented 2 years ago

Hello,

I've been able to identify the issue here reading hdf5 files with that error from another issue. That user was kind enough to share some data with me so I could confirm and fix.

You will need to install the VBZ plugin from ONT for hdf5/h5py to be able to read the vbz compressed fast5 files.

you can get the installer from here

https://github.com/nanoporetech/vbz_compression/releases

Once installed, everything should work as usual.