Closed hamid89 closed 10 months ago
same issue, i used the fast 5 that minknow is producing. i also dont see any fastoutput anymore on recent guppy versions. so i dont have any clue how to get the fastq "base info" into the fast5 data. is it possible to run the preprocess with fast5 and fast5 as input?
@replikation I believe I sorted out the problem (still waiting for nanodisco preprocess results). you need to use guppy on fast5 raw signal data with the option '--fast5_out'. The resultant fast5 files you will get in /your_basecalled_directory/workspace. These fast5 files you give to 'nanodisco preprocess' command.
@hamid89
i think this flag is gone now?
guppy_basecaller --fast5_out
Unexpected token '--fast5_out' on command-line.
this is from guppy version
guppy Basecalling Software, (C) Oxford Nanopore Technologies plc. Version 6.5.7+ca6d6af
I am not aware of the newer version of guppy because it is going to be replaced by dorado anyway. May I ask which flow cells data are you using? Nanodisco supports R.9.4 flow cells and you need to have whole genome amplification as well as native DNA sequencing of the same samples.
ah we are using 10.4.1 and yes with both read sets available. Dorado basecaller help:
Positional arguments:
model the basecaller model to run.
data the data directory.
Optional arguments:
-h, --help shows help message and exits
-v, --verbose
-x, --device device string in format "cuda:0,...,N", "cuda:all", "metal", "cpu" etc.. [default: "cuda:all"]
-l, --read-ids A file with a newline-delimited list of reads to basecall. If not provided, all reads will be basecalled [default: ""]
--resume-from Resume basecalling from the given HTS file. Fully written read records are not processed again. [default: ""]
-n, --max-reads [default: 0]
--min-qscore [default: 0]
-b, --batchsize if 0 an optimal batchsize will be selected. batchsizes are rounded to the closest multiple of 64. [default: 0]
-c, --chunksize [default: 10000]
-o, --overlap [default: 500]
-r, --recursive Recursively scan through directories to load FAST5 and POD5 files
--modified-bases [nargs: 1 or more]
--modified-bases-models a comma separated list of modified base models [default: ""]
--modified-bases-threshold the minimum predicted methylation probability for a modified base to be emitted in an all-context model, [0, 1] [default: 0.05]
--emit-fastq Output in fastq format.
--emit-sam Output in SAM format.
--emit-moves
--reference Path to reference for alignment. [default: ""]
-k k-mer size for alignment with minimap2 (maximum 28). [default: 15]
-w minimizer window size for alignment with minimap2. [default: 10]
-I minimap2 index batch size. [default: "16G"]
But nanodisco doesn't support R10 flow cells generated data
okay that's kind of an issue then for this workflow. but thanks for responding
Hello,
I used the nanodisco preprocess with the following command:
nanodisco preprocess -p 10 -f original_DNA_fast5/ -s native_samples -r assembly/meta_assembly.fasta -o .bam
got an error task 1 failed with following message:
Object '/read_001939f8-f602-4a6a-b610-06b1f2166001/Analyses/Basecall_1D_000/BaseCalled_template' does not exist in this HDF5 file
can you please guide me through the issue what is wrong I am doing.
Thank you.
Best,
Hamid