Open hasindu2008 opened 2 years ago
Hello @hasindu2008,
Unfortunately this is not readily implementable but should be doable "by hand". I made this design choice a while ago, because I found it to be the least error prone as it assure that the fast5
, fastq
, and bam
matches. But this is indeed less efficient. Please let me know if you want a high level alternate solution.
Best,
Alan
Do you only need the base called read in the FAST5 file generated with --fast5-out
or do you rely on the move table as well?
Basically, we need to be able to execute nanopolish eventalign
for aligning events on the reference. The fastq are extracted and contain the path to the fast5 in each read's header which I found, at the time, to be efficient for indexing. I don't know if this is still the case.
Ohh, I suggest trying replacing nanopolish with f5c and both indexing (no need to have in the header) and event alignment will be much faster (~3-5X) with near-identical results.
f5c index -d fast5_dir in_fasta -t num_threads --iop num_threads
f5c eventalign -t num_threads --iop num_threads --scale-events -n -r in_fasta -b in_bam -g tmp_genome
You can make it 10X faster if you switch to BLOW5 format with added advantages such as less backward compatibility headaches and saving a lot of unnecessary dev time. slow5tools can be used to streamline many signal merge/split/get operations and both nanopolish and f5c are compatible with BLOW5 format.
f5c index -t num_threads in_fasta --slow5 signals.blow5
f5c eventalign -t num_threads --iop num_threads --scale-events -n -r in_fasta -b in_bam -g tmp_genome --slow5 signals.blow5
In the previous response, can you please explain what you meant by matching fast5, fastq, and bam matches? Each multi-fast5 files separately run with nanopolish in your script or do you concatenate all the the FASTQ and then run one nanopolish instance?
Hi @touala,
I am having trouble generating files with --fast5-out and so have a similar question. Can you clarify what you mean by "but should be doable by hand"? I may need to go this route. I have basecalled fastq files. I agree with hasindu that this solution may become important since it seems like nanopore is planning to remove the fast5-out option.
Thanks! Emily
@ecpierce Hi, Emily, we ran into the --fast5-out
option deprecation problem ourselves, and we opted to just download an older version of Guppy rather than figuring out a way to be able to use *.fastq
files. As of this writing, it looks like version 6.4.2 is the most recent, but version 6.2.1 is the most recent version prior to the deprecation of the --fast5-out
option in version 6.3+.
@jflopezfernandez thank you for your response! That is the solution I ended up using. It would be useful though if nanodisco developers consider working on a long-term solution so that it will be compatible with even newer Guppy versions in the future. It seems like Dorado uses pod5 format- not sure how that would impact things but I guess something else to consider if nanodisco is going to be actively maintained. Really appreciate this awesome program!
Thank you very much for sharing your experience and solutions to other users, Jose!
For the question from Emily: we are very much encouraged by the broad interests in Nanodisco, and yes we are committed to maintain it in the long term. This being said, because Nanopore software and kits are constantly evolving, our strategy (given the finite resources we have) is to 1) use Singularity to ensure the current package versions are compatible and the entire workflow is reliably working; 2) we do plan to release major upgrades: it would not be frequent (given the nature of nanopore software/kit evolution explained above), but we will do it for major milestones!
Best, Gang
@fanggang that makes sense. I appreciate your work and am glad to hear you are committed to maintaining!
Is it possible to make nanodisco accept a FASTQ file that contains basecalled reads, rather than extracting this from FAST5 files? This way, the need to rebasecall with --fast5-out will no longer be necessary I believe?