Closed LynnLy closed 3 years ago
Hello @LynnLy,
Unfortunately, I do not provide a way to select which basecalling version to use with the current implementation. nanodisco
interact with fast5s at two steps: nanodisco preprocess
(extract read sequences) and nanodisco difference
(align signal with nanopolish
). I've confirm with Jared that nanopolish
only needs the reads from the desired version. I'll try to add the feature in nanodisco preprocess
today but if you are in a hurry you can basecall the fast5 again so that the same version is found in Basecall_1D_000
for both datasets.
Alan
I have implemented the new feature which I think can address the issue you raised. Two files need to be replaced: extract.R
and preprocess.sh
. You can find them here and integrate them to the container by doing:
wget https://github.com/fanglab/nanodisco/files/5011419/nanodisco_feature.zip # Download .zip mentioned above
unzip nanodisco_feature.zip
# Create a writable temporary container (directory) named nd_tmp, ~5 min
singularity build --sandbox nd_tmp nanodisco.sif
mv extract.R preprocess.sh nd_tmp/home/nanodisco/code # Replace function with new feature
chmod 755 nd_tmp/home/nanodisco/code/* # Set proper permission
# Create a new container with the additional feature
singularity build nd_env nd_tmp
You can now provide the --basecall_version
option (<basecaller:version>
) to specify which basecalling version you want to use (e.g. Guppy:4.0.14
or Albacore:2.3.4
). In your case nanodisco preprocess
can be executed as follow:
nanodisco preprocess -p <nb_threads> -f <path_fast5> -s <name_sample> -o <path_output> -r <path_reference_genome> --basecall_version <basecaller:version>
Please let me know if this solves your issue or if you have any additional questions.
Alan
Great! I just tested this out with a rebasecalled Guppy dataset with the --fast5_out
option, and it seems to be working. Thanks!
Hi! I am trying to do methylation binning - I see in your FAQ that my native DNA and WGA datasets should use the same basecaller and version. My datasets were generated at different times and basecalled with different versions - I can rebasecall them with the same version, but I only see a place to specify the fast5 files and not basecalled fastq files.
Is there a way to specify which basecalled fastqs I want to use, and to only rely on the fast5 for signal information? Or, a way to make sure the fastqs used from the fast5s are the correct ones, because the same fast5 files may be basecalled twice and hold two sets of fastq data?
Thank you!