bioinfomaticsCSU / deepsignal

Detecting methylation using signal-level features from Nanopore sequencing reads
GNU General Public License v3.0
108 stars 21 forks source link

extract_features fails on all files #76

Closed jamesabbott closed 2 years ago

jamesabbott commented 2 years ago

Hello,

I'm having trouble with extract_features reporting that it has failed on all files, but with no suggestion as to why. The (DNA) reads have been resquiggled using default basecall group/subgroups, and I can confirm that the RawGenomeCorrected_000 group is present in the files. I'm doing M6A detection, so have set the motif to GATC and mod_loc to 1. The job reports the following:

# ===============================================
## parameters:
fast5_dir:
        resquiggled
recursively:
        yes
corrected_group:
        RawGenomeCorrected_000
basecall_subgroup:
        BaseCalled_template
reference_path:
        reference/x.fasta
is_dna:
        true
normalize_method:
        mad
methy_label:
        1
kmer_len:
        17
cent_signals_len:
        360
motifs:
        GATC
mod_loc:
        1
positions:
        None
write_path:
        results/x_features.tsv
w_is_dir:
        no
w_batch_num:
        200
nproc:
        48
f5_batch_num:
        50
# ===============================================
40705 fast5 files in total..
parse the motifs string..
read genome reference file..
read position file if it is not None..
write_process started..
finishing the write_process..
40705 of 40705 fast5 files failed..
extract_features costs 37.7 seconds..

Any suggestions as to what might be going wrong here?

PengNi commented 2 years ago

Hi @jamesabbott , which version of deepsignal did you use? The GATC model is only compatible with deepsignal<=0.1.6.

jamesabbott commented 2 years ago

Hi @PengNi, Thanks for the quick response - I was using the latest version. I've downgraded to 0.1.6, unfortunately still see the same result.

PengNi commented 2 years ago

Hi @jamesabbott , then maybe it is caused by the vbz compression issue. Please try to export the HDF5_PLUGIN_PATH as follows before running deepsignal:

# download ont-vbz-hdf-plugin-1.0.1-Linux-x86_64.tar.gz (or newer version) and set HDF5_PLUGIN_PATH
# https://github.com/nanoporetech/vbz_compression/releases
wget https://github.com/nanoporetech/vbz_compression/releases/download/v1.0.1/ont-vbz-hdf-plugin-1.0.1-Linux-x86_64.tar.gz
tar zxvf ont-vbz-hdf-plugin-1.0.1-Linux-x86_64.tar.gz
export HDF5_PLUGIN_PATH=/abslolute/path/to/ont-vbz-hdf-plugin-1.0.1-Linux/usr/local/hdf5/lib/plugin

If this still doesn't work, can you send me some demo files so I can test myself?

Best regards, Peng

jamesabbott commented 2 years ago

Hi Peng,

Many thanks - that indeed does seem to have done the trick - it is now happily producing outputs.

Thanks for your help James