Open JohnUrban opened 8 years ago
Thanks for your interest in poreminion. Poreminion was designed on older sets of fast5 files. Since ONT changes the internal structure of the fast5 files fairly often, poreminion breaks with newer fast5 files. I think the last time I worked on it was about a year ago -- so it will mainly work on older data. I have been developing another set of nanopore data tools called fast5tools (https://github.com/JohnUrban/fast5tools) that has the flexibility to work with fast5s from the earliest days of MAP all the way until now. It does not have all the functionality of poreminion, but has most of the standard useful things and maybe some other things.
Having said that, another issue with the fast5 files is they seem easily corruptible. After moving files or tarring and untarring fast5 directories, I have noticed that 1 or a couple fast5s in the pass folder will all of a sudden be corrupted and will not open. To discern whether this is an hdf5 problem, and h5py, or poreminion problem, do the following in terminal - for the file in question:
h5fump file.fast5
or
h5dump -n 1 file.fast5
p.s. for now, if relevant, please cite the following for poreminion and/or fast5tools: https://github.com/JohnUrban/poreminion and/or https://github.com/JohnUrban/fast5tools and
Urban, J. M., Bliss, J., Lawrence, C. E. & Gerbi, S. A.
Sequencing ultra-long DNA molecules with the Oxford Nanopore MinION.
bioRxiv (Cold Spring Harbor Labs Journals, 2015). doi:10.1101/019281
http://biorxiv.org/content/early/2015/05/20/019281
This paper contains the first descriptions and uses of poreminion.
I actually used poreminion uncalled
successfully on the fast5 files before (lambda_burn). Only when I started getting trouble with other files did I re-test with these lambda_burn files and poreminion gave the same error.
The h5dump command result is attached. h5dump.txt
I doesn't seem like the fast5 files are corrupted then? Perhaps an h5py update is causing the issue?
Can you share a problematic fast5 file?
I can reproduce your issue on files that have no 2D and no 1D information -- but filled out with information otherwise -- and with completely empty fast5 files. It seems to give the error on files that are not base-called as well -- which is something that I should make sure it catches.
I think I have narrowed down the problem to a couple of FAST5 files that were previously flagged as having time errors. Does this mean the time error check should be run before the uncalled test?
In the biorxiv preprint, it shows that I did the following order:
$ poreminion uncalled -m -o fail-filter fail/
$ poreminion timetest -m -o fail-filter fail/
Noting that the pass folder should never need to be filtered. If you find problems in the pass folder you should bring it up with ONT right away so they can fix their base-caller/etc.
I am very curious as to what version of MinKNOW and Metrichor you are using. The "time error" problem was claimed to have been solved a long time ago. At least a year and a half ago. I did not find time errors in a lot of my data since ONT told me that -- such that at some point I didn't bother to look any more (since it is a very time consuming step). Are you finding time errors in newer data?
Both of the files you sent have "No template data found" in their logs. If poreminion did not label them as such it is almost certainly because ONT has changed the log locations in the HDF5 files since I last worked on poreminion. They change the internal structure of fast5s fairly frequently -- which was my main motivation for starting over with fast5tools
. The goal is for fast5tools to be able to be robust to changes.
I saw the logs with:
fast5stats BSPC_15090L_lambda_0127_1_ch332_read112_strand.fast5
fast5stats BSPC_15090L_lambda_0127_1_ch352_read73_strand.fast5
It is under active development and the syntax may change to fast5tools stats ….
. I am undecided as to whether I want it to follow the command subcommand [options]
format or just be a set of various different scripts.
Instead of filtering now, I just run something like:
fast5stats -s -e errorfiles.txt pass/ fail/ > stats.txt
The file called errorfiles.txt
will show all the files that gave problems. The other scripts will just ignore the problematic files so they can be left where they are.
Right, yes I was following the biorxiv preprint. I'll check the versions but should be the latest ones, the run was done a few weeks ago.
I also just ran the poreminion seqlen
command on the pass folder and got a similar error:
poreminion seqlen pass.fast5.tar
Traceback (most recent call last):
File "/usr/local/bin/poreminion", line 9, in <module>
load_entry_point('poreminion==0.4.4', 'console_scripts', 'poreminion')()
File "/usr/local/lib/python2.7/site-packages/poreminion-0.4.4-py2.7.egg/poreminion/poreminion_main.py", line 1066, in main
args.func(parser, args)
File "/usr/local/lib/python2.7/site-packages/poreminion-0.4.4-py2.7.egg/poreminion/poreminion_main.py", line 62, in run_subtool
submodule.run(parser, args)
File "/usr/local/lib/python2.7/site-packages/poreminion-0.4.4-py2.7.egg/poreminion/seqlen.py", line 15, in run
print_seq_name_and_length(f5, gettemp=gettemp, getcomp=getcomp, get2d=get2d)
File "/usr/local/lib/python2.7/site-packages/poreminion-0.4.4-py2.7.egg/poreminion/info.py", line 45, in print_seq_name_and_length
print name + "template\t" + str(get_seq_len(f5, readtype="template"))
File "/usr/local/lib/python2.7/site-packages/poreminion-0.4.4-py2.7.egg/poreminion/info.py", line 34, in get_seq_len
return f5connection[path].attrs["sequence_length"]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/Users/travis/build/MacPython/h5py-wheels/h5py/h5py/_objects.c:2687)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/Users/travis/build/MacPython/h5py-wheels/h5py/h5py/_objects.c:2645)
File "/usr/local/lib/python2.7/site-packages/h5py/_hl/group.py", line 166, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/Users/travis/build/MacPython/h5py-wheels/h5py/h5py/_objects.c:2687)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/Users/travis/build/MacPython/h5py-wheels/h5py/h5py/_objects.c:2645)
File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/Users/travis/build/MacPython/h5py-wheels/h5py/h5py/h5o.c:3573)
KeyError: "Unable to open object (Object 'basecall_1d_template' doesn't exist)"
The fast5stats from fast5tools will give you all seqlen information. However you can also try another fast5tools script:
fast5tofastx -r $READTYPE -o details pass/ fail/ > seqlens.txt
If you downloaded fast5tools in the past few days or so and had trouble, that is because we made some large packaging changes that broke all the scripts in the same way. I just found that out when I tried to use it. All should be fixed now though -- re-download it if necessary. If you have trouble with fast5tools, please report the issue over there. I will take care of it ASAP.
Opened by Jean-Michel Carter.
Dear John,
I have been using your poreminion program but it has recently started giving me errors that seem to be related to h5py. I am using homebrew to manage Python and poretools.
I have tried re-installing poreminion (which seems to do so without any problem) and h5py but I am out of ideas.
Do you have any advice as to what is causing the errors? I have attached the install log and the error output.