Closed mtisza1 closed 1 year ago
OK, so I am looking at your code a little bit and in the extract.R
script, lines 16-40, I can see that it is looking for the strings "Guppy" or "Albacore". For whatever reason, my version of the gridION software or minknow is only specifying the guppy version with the attribute "guppy_version". I wonder if this could be solved by changing this part of the code.
Hi Mike,
Thank you for providing a detailed description of the issue.
You're definitely on the right track. I've made changes related to live base calling before (in #5) but the fast5 output format might have evolved since.
f5_data <- h5readAttributes(path_first_fast5, path_basecalling) # Extract read data (fastq, move, and trace)
# Is not length(f5_data)==0 now
I'll investigate and fix it, but meanwhile you should be able to bypass this sanity check by modifying the current differences output from nanodisco merge
. This won't affect the results. Start R
in interactive mode and execute the following commands with proper paths.
nanodisco_res <- readRDS("<path_to_current.RDS>")
attr(nanodisco_res, "basecaller") # Could you also report the output from this command?
attr(nanodisco_res, "version") # Could you also report the output from this command?
attr(nanodisco_res, "basecaller") <- "Guppy"
attr(nanodisco_res, "version") <- "4.0.11"
saveRDS(nanodisco_res, "<path_to_current_fixed.RDS>")
Please let me know if this is working or if you cannot use R
altogether.
Alan
Hi Alan,
Thanks so much for the speedy reply! It is working now, and I got expected outputs from characterize
Here are the outputs you asked for (this is from analyses not using --basecall_version
flag)...
Before changing anything:
> attr(nanodisco_res, "basecaller")
NULL
> attr(nanodisco_res, "version")
NULL
After changing the .RDS file:
> attr(nanodisco_res, "basecaller")
[1] "Guppy"
> attr(nanodisco_res, "version")
[1] "4.0.11"
I suspect that ONT has slightly different fast5 formats between minION, gridION, and promethION, as well as slightly different formats for different versions of the software. Personally, I don't envy that you work extensively with this file format.
Cheers,
Mike
Hi Mike,
Thank you for reporting the results. I made new fixes that should better handle those live-basecalled reads. You can give it a try with:
singularity pull nanodisco.v1.0.3_dev.sif docker://fanglab/nanodisco:dev
Do you mind running nanodisco difference
on one chunk and report again the attr
values? I tried on the fast5 you send earlier and it reported Guppy_live
and 4.0.11+f1071ce
as expected but I'd like to be sure it's ok for all reads.
Thanks.
Alan
P.S.: I'll get back to you about PCR Barcoding later, but thank you very much for the report.
OK, Alan, thanks for the update. I did what you requested, and it appears to be working:
> nanodisco_res <- readRDS("chunk.102.difference.rds")
> attr(nanodisco_res, "basecaller")
[1] "Guppy_live"
> attr(nanodisco_res, "version")
[1] "4.0.11+f1071ce"
I'll let you know if anything seems to go wrong downstream of this.
P.S. Do you foresee any issue in the compatibility of Nanodisco
and Guppy version 5
?
Thank you very much.
I completed a couple of analysis of 5.0.7 datasets (differed base calling) with success. Looking at live-basecalling file structure, I don't see differences so this nanodisco
dev version (v1.0.3_dev
) should work. However, please note that it's not fully compatible with singularity due to the conversion from docker. I'll try to push the complete version (v1.0.3
) today.
However, the new fast5 compression solution (i.e. vbz) broke the compatibility with nanopolish
's version I'm currently using. I'm doing some testing with the latest nanopolish version which handle it, as I need to check that the motif typing and fine mapping is not affected (it shouldn't). In the meantime, you can use ont-fast5-api to convert the fast5 encoding to gzip
with:
compress_fast5 -t 20 --recursive -i <input_vbz.fast5> -s <input_gzip.fast5> -c gzip
This is not ideal but hopefully I can get v1.0.4
out soon enough.
Alan
Hi Alan,
I'm having issues with
Nanodisco
reading the basecaller version on my reads. I'm usingnanodisco
version 1.0.2 on the singularity container. If I run the each step of the pipeline without specifying--basecall_version
, it will run throughpreprocess
,difference
,merge
andmotif
without error (and the motif results seem correct based on other data), but then, when I runcharacterize
I get this errorOn the other hand, if I run from the beginning using the flag
--basecall_version Guppy:4.0.11
onpreprocess
I get:Specifying
--basecall_version Unknown:4.0.5
does not resolve the issue.Essentially, it seems the
nanodisco
scripts are reading the "MinKNOW-Live-Basecalling" information in the fast5 file (which is near the top of the file) and ignoring the guppy information (which is near the bottom of the file). I don't codeR
at all, so I can't parse through your scripts to figure out whatnanodisco
is actually doing, however.I've attached the first read from a set of fast5 files for a run that this error occurred. These reads were live basecalled on the gridion, and the fastq files all look good from a length and quality standpoint. I hope it will help get to the bottom of this, but I am happy to provide more information.
I'd really love to get this resolved so I can use your tool! Thanks in advance.
0009d4a2-1f1f-4b8d-a7eb-a75d6fa1c401.fast5.zip