arq5x / poretools

a toolkit for working with Oxford nanopore data
MIT License
242 stars 90 forks source link

Increase in read number from fast5 to fastq #133

Open emilyjunkins opened 7 years ago

emilyjunkins commented 7 years ago

Hello,

I have just used the fastq converter on my albacore basecalled fast5 files and noticed that the number of reads increases. For instance if count the number of fast5 files in a directory containing only fast5 files $ls ./ | wc -l

9231

But when I convert to fastq the number of 'reads' will increase.. $ poretools fastq ./ | grep '^@' | wc -l

20037

Is this what I should expect? or is this something wrong with what I am doing or not understanding about the conversion from fast5 to fastq?

nickloman commented 7 years ago

These may be 2D reads, in which case you are getting template strands, as well as complement and 2D reads. If you just want 2D add --type 2D and if you just want template add --type strand.

Additionally I do not think your grep command will be necessarily accurate as I believe the quality score line can start with an @ character. I prefer to do a wc -l without the grep and divide by 4 to get read count.