GenomiqueENS / toulligQC

A post sequencing QC tool for Oxford Nanopore sequencers
Other
82 stars 7 forks source link

toulligqc not processing directory of fast5 files... #6

Closed nickschurch closed 5 years ago

nickschurch commented 6 years ago

I'm trying to run toulilgqc to generate statistics for a run but I can't persuade it to read a directory of fast5 files, let along the standard sub-directory heirarchy produced by albacore. When I specify a directory containing all the fast5 files from a run toulligqc fails with:

ToulligQC version 0.5
* Initialize extractors
fast5_directory
* Start FAST5 extractor
Traceback (most recent call last):
  File "/cluster/gjb_lab/nschurch/cluster_installs/miniconda2/envs/toulligQC/bin/toulligqc", line 11, in <module>
    sys.exit(main())
  File "/cluster/gjb_lab/nschurch/cluster_installs/miniconda2/envs/toulligQC/lib/python3.6/site-packages/toulligqc/toulligqc.py", line 252, in main
    extractor.extract(result_dict)
  File "/cluster/gjb_lab/nschurch/cluster_installs/miniconda2/envs/toulligQC/lib/python3.6/site-packages/toulligqc/fast5_extractor.py", line 87, in extract
    result_dict['flow_cell_id'] = self._get_fast5_items(h5py_file,'flow_cell_id')
  File "/cluster/gjb_lab/nschurch/cluster_installs/miniconda2/envs/toulligQC/lib/python3.6/site-packages/toulligqc/fast5_extractor.py", line 192, in _get_fast5_items
    tracking_id_items = list(h5py_file["/UniqueGlobalKey/tracking_id"].attrs.items())
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/cluster/gjb_lab/nschurch/cluster_installs/miniconda2/envs/toulligQC/lib/python3.6/site-packages/h5py/_hl/group.py", line 167, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: 'Unable to open object (component not found)'

Modifying the extractor python script int he _read_fast5(self): method to see what is going on reveals that the directory extension is being set correctly printing the glob lines befow reveals:

elif self.fast5_file_extension == 'fast5_directory':
            if glob.glob(self.fast5_source+self.run_name+'/*.fast5'):
                self.fast5_file = self.fast5_source+self.run_name+'.fast5'

glob.glob: datadir/allfast5/*.fast5 self.fast5_file: datadir/allfast5.fast5

Where allfast5 is the run name, datadir is the input path specified with --fast5-source, and datadir/allfast5 contains all the *.fast5 files.

Should the extractor be looping over all the fast5 files in the glob?

laffayb commented 6 years ago

Hi, I've just fixed the issue (it now works with a directory of fast5 files). You can update your docker image with the following command and then relaunch your QC: $ docker pull genomicpariscentre/toulligqc:latest

Best regards, Bérengère