Psy-Fer / SquiggleKit

SquiggleKit: A toolkit for manipulating nanopore signal data
MIT License
122 stars 23 forks source link

Error with hdf5 plug in #52

Closed miles-jon closed 2 years ago

miles-jon commented 2 years ago

Hello there,

I am trying to use SquigglePull.py to visualise some nanopore fast5 files. I am attempting to do this in ubuntu with an environment running python 3.6. Whilst Squigglekit is installed fine, and the path to my files is correct, every time I run the squigglepull command it returns an error relating to the hdf5/h5py plugin, despite it being installed in the environment I am trying to use. The error is as follows: OSError: Can't read data (can't open directory: /home/miles-jon/anaconda3/envs/squigglekit_env/lib/hdf5/plugin) extract_fast5_all():failed to read readID: read_6c3c3be4-e5b0-49ba-b0a0-a1c0167ce136Traceback (most recent call last): File "SquigglePull.py", line 211, in extract_f5_all for col in hdf[read]['Raw/Signal'][()]: File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/home/miles-jon/anaconda3/envs/squigglekit_env/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 787, in getitem self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5d.pyx", line 192, in h5py.h5d.DatasetID.read File "h5py/_proxy.pyx", line 112, in h5py._proxy.dset_rw

However: (squigglekit_env) miles-jon@ubuntu:~/SquiggleKit$ conda list hdf5

' packages in environment at /home/miles-jon/anaconda3/envs/squigglekit_env:

'

' Name Version Build Channel

hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge hdf5plugin 3.2.0 pypi_0 pypi.

I had a similar problem previously where the system couldn't find the hdf5-plugin in my local user library but that was resolved when I directly installed the hdf5 plugin.

Please advise on how I need to arrange the hdf5/h5py libraries in order to get the SquigglePull.py script to identify them, as they are definitely installed.

Thanks,

Jon

Psy-Fer commented 2 years ago

Hey Jon,

Do these fast5 file have vbz compression on them? I have not tested the fast5 reading with vbz yet. I think the solution there is to just have the ont vbz add on available for hdf5 to use.

Otherwise, if you are going for visualisation, you could try squigglePull directly. If you get the same error, perhaps converting the fast5 to slow5 with slow5tools, then using squigglePull with the --slow5 arg

Let me know how that goes.

James

miles-jon commented 2 years ago

Hi James,

Just tried with the vbz addon but that didn't work. As a fallback I tried the same SquigglePull.py command with the test data that comes with the SquiggleKit package and, interestingly, it came back fine.

Successful command:

(squigglekit_env) miles-jon@ubuntu:~/SquiggleKit$ python SquigglePull.py -rv -p /home/miles-jon/SquiggleKit/example/ > test.tsv
Verbose mode on. Starting timer.
/home/miles-jon/SquiggleKit/example/test.fast5 detected as a single fast5 file
Time taken: 0.020241975784301758

vs unsuccessful command which throws back the hdf5-related error:

(squigglekit_env) miles-jon@ubuntu:~/SquiggleKit$ python SquigglePull.py -rv -p /home/miles-jon/Documents/Nanopore_Runs/T7_pGEM_Runs/Libraries2_4/20211125_1114_MN38197_FAR32678_d3a59952/fast5/ > lib2_4_data.tsv

Any idea as to what could be wrong with my experimental data? If not there is more data from other runs that I can play with to see if that comes back OK, I will let you know.

Thanks,

Jon

Psy-Fer commented 2 years ago

Thanks for that info.

Could you let me know the kit chemistry and flowcell type?

Seems like I am probably gong to have to get one of these fast5 files to figure out what's changed that's causing these issues (there is someone else with a similar problem)

miles-jon commented 2 years ago

Hi James,

Apologies for the late reply. The RNA was sequenced with the RNA002 kit on a R9.4.1 flow cell. If you want to have a look at some of my data, for which I was seeking permission to send, let me know how I can contact you to pass on the file.

Thanks,

Jon

Psy-Fer commented 2 years ago

Okay,

If it's possible to get one of the fast5 files in a cloud share link (Dropbox/google, etc), I promise not to basecall it and delete it after I've done my investigation to reproduce the issue.

Cheers,

James

Psy-Fer commented 2 years ago

deleted message:

< link > Here is a link to the file on OneDrive, I've put a week's timer on it just in case. Let me know if you have any issues etc.

Thanks,

Jon

Hey Jon,

I have downloaded the file, and removed your message with the link.

I'll let you know how I get on when I work on it when I wake up. Talk soon.

James

Psy-Fer commented 2 years ago

Hey Jon,

So i was able to reproduce the same error on a fresh linux machine that ONLY had h5py on it. Once I installed the vbz plugin from ONT, it worked as you would expect it to.

So the issue is the files are compressed with VBZ, which requires a plugin from ONT for hdf5/h5py to use.

Please install the VBZ plugin from here https://github.com/nanoporetech/vbz_compression/releases

Alternatively, the slow5 method I mentioned will work without any of this, as we put some static libs into the toolkit so the user doesn't have to deal with any of this.

Let me know how you go.

James.

Psy-Fer commented 2 years ago

Hello Jon,

Did you have any luck with this?

miles-jon commented 2 years ago

Hi James,

Happy New Year! Apologies for the late reply. Yes, your tip about the VBZ plugin worked, and the slow5tools package actually came with a script that actually installed it for you, which once integrated into the bash.rc worked a treat. Thanks for all your help and speak soon,

Jon