fanglab / nanodisco

nanodisco: a toolbox for discovering and exploiting multiple types of DNA methylation from individual bacteria and microbiomes using nanopore sequencing.
Other
66 stars 7 forks source link

error when running nanodisco motif command #31

Closed H1889 closed 1 year ago

H1889 commented 2 years ago

Hi, I am running my own analysis and everything goes well until motif search, it starts running but after a while an error is shown:

2021-11-24 01:33:50] Prepare output folder. [2021-11-24 01:33:50] Load supplied current differences. [2021-11-24 01:33:50] Detect motifs. [2021-11-24 01:33:50] Processing statistical signal. Error in eval(e, x, parent.frame()) : object 'p' not found Calls: wrapper.motif.detection ... subset -> subset -> subset.data.frame -> eval -> eval Execution halted

one thing I have noted is that the chunks.rds file sizes are about 300 bytes and the final merged rds file is just 3.4kb.

However, when I run the example data, everything went well.

Any suggestion about a solution?

Thanks a lot

touala commented 2 years ago

Hi @H1889,

Thank you for those information. The chunks.rds seems pretty much empty. Something went wrong in nanodisco difference step. A similar issue was reported in #26 and the root cause was the newish fast5 compression method (vbz vs gzip). I described a temporary fix in the aforementioned issue (#26). If this is not it, please reopen this issue.

Alan

H1889 commented 2 years ago

Thanks a lot. I checked the compression method with your script shown at #26 and my fast5 files are vbz compressed.

H1889 commented 2 years ago

Thanks again, the pipeline ended well and now I have got the motifs. Thank you very much. Gabriel


De: touala @.> Enviado: miércoles, 1 de diciembre de 2021 23:11 Para: fanglab/nanodisco @.> Cc: Gabriel Gutiérrez Pozo @.>; Mention @.> Asunto: Re: [fanglab/nanodisco] error when running nanodisco motif command (Issue #31)

Hi @H1889https://github.com/H1889,

Thank you for those information. The chunks.rds seems pretty much empty. Something went wrong in nanodisco difference step. A similar issue was reported in #26https://github.com/fanglab/nanodisco/issues/26 and the root cause was the newish fast5 compression method (vbz vs gzip). I described a temporary fix in the aforementioned issue (#26https://github.com/fanglab/nanodisco/issues/26). If this is not it, please reopen this issue.

Alan

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/fanglab/nanodisco/issues/31#issuecomment-984099655, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AVMTF3CSHFPFL4LZWSATKRTUO2MP3ANCNFSM5IWDZZBA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

H1889 commented 2 years ago

Thank you very much, at the end everything went well and I was able to do all the analyses. Just a last question, The score command of nanodic gives the motif position and score, so, are those scores filtered? Are those scores significant? In other words, the score command outputs only the motifs that are METHYLATED, is it not? I assume the non-methylated motifs are not present at the tsv file. Best regards Gabriel


De: touala @.> Enviado: miércoles, 1 de diciembre de 2021 23:11 Para: fanglab/nanodisco @.> Cc: Gabriel Gutiérrez Pozo @.>; Mention @.> Asunto: Re: [fanglab/nanodisco] error when running nanodisco motif command (Issue #31)

Hi @H1889https://github.com/H1889,

Thank you for those information. The chunks.rds seems pretty much empty. Something went wrong in nanodisco difference step. A similar issue was reported in #26https://github.com/fanglab/nanodisco/issues/26 and the root cause was the newish fast5 compression method (vbz vs gzip). I described a temporary fix in the aforementioned issue (#26https://github.com/fanglab/nanodisco/issues/26). If this is not it, please reopen this issue.

Alan

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/fanglab/nanodisco/issues/31#issuecomment-984099655, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AVMTF3CSHFPFL4LZWSATKRTUO2MP3ANCNFSM5IWDZZBA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

linuxxue commented 2 years ago

I have the same issue,I checked the compression method with your script shown at #26 ,but my fast5 is no VBZ-compressed. Any suggestion about a solution?

Thanks a lot

touala commented 2 years ago

Hi @H1889,

The scores given by nanodisco score command are not filtered, and they do not indicate if the motif occurrence is modified or not. This was mostly implemented as a mean to visualize misassembled contigs in metagenome, where scores are aggregated and compared across a full contigs (please see Figure 5a of nanodisco publication). Those scores cannot be used as a single site methylation level estimation because the methylation signal intensity depends on the local genomic context. You would need a tool like nanopolish or medaka with a model trained for your specific set of motifs.

Alan

touala commented 2 years ago

Hi @linuxxue,

I don't see any obvious explanation. What is the size of the current differences file (*.RDS)? It should be in the Mb range for a typical prokaryote genome. Could you consider sending me this file and the reference genome used to my email (alan.tourancheau [at] bio.ens.psl.eu)? It would be the most efficient way for me to help you.

Regards,

Alan

ilhdkk commented 2 years ago

Hi @linuxxue,

I meet the same issue as you, my fast5 are no VBZ-compressed, did you solve the issue?

Thanks a lot

kk

linuxxue commented 2 years ago

@ilhdkk , Do you use the following methods: if h5dump -pH | grep -q vbz; then echo "It is VBZ-compressed" else echo "It is not VBZ-compressed" fi my fast5 are no VBZ-compressed, I use it,but it display "It is not VBZ-compressed".Actually, it's still VBZ-compressed。 you can use ONT tool (ont_fast5_api)with the compress_fast5 command to convert them back to gzip: compress_fast5 -t 10 --recursive -i -s "_gzip" -c gzip

touala commented 2 years ago

Thank you very much @linuxxue for reporting this problem for identifying VBZ compressed files. I'd like to update the testing command but I can't replicate with my fast5. Do you mind trying to run the following commands to help my find a better testing solution?

h5dump -pH <path_fast5_file> | grep VBZ
# or
h5dump -pH <path_fast5_file> | grep COMMENT

For datasets from multiplexed runs, a better alternative to compress_fast5 is running the following command which demultiplex and convert the fast5 (also in ont_fast5_api).

demux_fast5 -t 2 -r -c gzip -i <input_fast5_dir> -s <output_fast5_dir> -l <path_sequencing_summary.txt>

Best,

Alan

ilhdkk commented 2 years ago

@linuxxue I try the methods and although it displayed "It is not VBZ-compressed",I still can convert my fast5 files back to gzip,thank you for your suggestion! Best,

kk