arq5x / poretools

a toolkit for working with Oxford nanopore data
MIT License
239 stars 90 forks source link

Poretools 0.6.0 does not output sequences on Metrichor and Albacore basecalled 'skipped' reads #108

Open michieitel opened 7 years ago

michieitel commented 7 years ago

In my last 1D 450bp R9.4 run I tried running local basecalling which ended up in >370K 'skipped' reads. I ran both Metrichor and Albacore on these reads to get the sequence information of these reads. Basecalling worked but when running Poretools 0.6.0 (any option) on these reads ('passed' and 'workspace', respectively) I just get "WARNING:poretools:No valid sequences observed".

I checked some files if there was any change in file size after basecalling and both methods increased the file size. I assume sequence information was written in the output fast5 files but why can't I get it?

Thanks for advice Michael

awitney commented 7 years ago

I think basecalling is sequentially added to the fast5. So the first run goes under

/Analyses/Basecall_1D_000/BaseCalled_template/Fastq

then second under

/Analyses/Basecall_1D_001/BaseCalled_template/Fastq

etc. so try

poretools fastq --group 1 test_data/

you can find out exactly where using

h5ls -r read.fast5

filicado commented 7 years ago

I found the same issue, most other poretools commands also don't work on these files. When the two types of fast5 files are in the same directory stats, hist etc. only detetct the original reads from the pass folder not the reads from the skip folder that were basecalled separately with albacore.