WGLab / NanoMod

NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data
Other
35 stars 8 forks source link

Using NanoMod to detect EdU calls #16

Open OferRog opened 3 years ago

OferRog commented 3 years ago

Hi - We are trying to use NanoMod to call out EdU bases in Nanopore sequences. Since Albacore is no longer supported, we use Guppy to do basecalling. We used the "fast5_out" command during basecalling to get the fast5 files containing event data. Next, we used multi-to-single command to converts fast5 files to the single_read_fast5 files. We also used bwa to index the genome fasta file, and then tried to run NanoMod annotate command. When we did that, we got the following errors (see attached file). Many thanks for your help! Ofer and Kewei image

liuqianhn commented 3 years ago

@OferRog could you please show me what you can get from h5ls -r workspace/*_5.fast5 | head -n 50 ?

OferRog commented 3 years ago

@liuqianhn I think the command should be 'h5ls -r workspace/*.fast5 | head -n 50', right? And the output is:

Capture

Thanks!

liuqianhn commented 3 years ago

It seems that the fast5 files are multi-fast5 format which this version NanoMod does not support yet. You can use the single-fast5 format output from your multi-to-single command. Meanwhile, please note that we have been improving NanoMod to support both multi-fast5 and move table with substantially improved results and running time (5 times faster). It might take time to make improved NanoMod publicly available.

OferRog commented 3 years ago

@liuqianhn Hi Qian, The fast5 files I'm showing you here are already single-format fast5s, after using multi-to-single command. I'm also using another methylation detecting software which requires single-format fast5s as well, and it works. Thank you!

liuqianhn commented 3 years ago

@OferRog It seems that I may misunderstand your folder organization. It would be helpful if you can show me the path of a single-fast5 file. Thanks.

OferRog commented 3 years ago

@liuqianhn Yes. Here is the raw multi-format fast5s generated during nanopore sequencing. There are 6 files.

fast5_multi

And after using multi-to-single command, each multi-format fast5 was separated into many single-format fast5s (one multi-format fast5, one folder) image image

liuqianhn commented 3 years ago

@OferRog If I understand it correctly, ~/work/nanomod_work/basecalled/Edu/workspace is for multi-fast5 files, while ~/work/nanomod_work/basecalled/Edu/m_to_s is for single-fast5 format. Please use ~/work/nanomod_work/basecalled/Edu/m_to_s rather than ~/work/nanomod_work/basecalled/Edu/workspace for NanoMod (the log of your previous commands used workspace/ for wrkbase1. Please correct me if I am wrong.).

OferRog commented 3 years ago

@liuqianhn Hi qian, Yes, you are right! I should use the single_fast5 format. However, when I ran these files, it showed that no event data in these fast5s. image I'm sure I ran guppy --fast5_out command to basecall these fast5s before nanomod processing, which would output the fast5s containing the event. image Have you ever use guppy basecaller before? Or probably I should ask the guppy author. Thank you very much for your help!

OferRog commented 3 years ago

@liuqianhn image

liuqianhn commented 3 years ago

@OferRog NanoMod will automatically search basecaller and if the basecaller is Guppy, NanoMod will detect move table; otherwise, NanoMod will try to detect event table. Do you mind sharing the first fast5 with an error message so that I can have a check to see what is wrong? Thanks.

OferRog commented 3 years ago

@liuqianhn Yes! Here is the link: https://github.com/OferRog/fast5s Thanks!

liuqianhn commented 3 years ago

@OferRog Thanks for sharing. Here is what I found from the fast5

[liuq1@l-0-01 test_nanomod]$ h5ls -r 1439aa6c-77fe-4311-a7b3-94ca7031e426.fast5
/                        Group
/Analyses                Group
/Analyses/Basecall_1D_000 Group
/Analyses/Basecall_1D_000/BaseCalled_template Group
/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/Analyses/Basecall_1D_000/Summary Group
/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
/Analyses/Basecall_1D_001 Group
/Analyses/Basecall_1D_001/BaseCalled_template Group
/Analyses/Basecall_1D_001/BaseCalled_template/Fastq Dataset {SCALAR}
/Analyses/Basecall_1D_001/BaseCalled_template/Move Dataset {2418}
/Analyses/Basecall_1D_001/BaseCalled_template/Trace Dataset {2418, 8}
/Analyses/Basecall_1D_001/Summary Group
/Analyses/Basecall_1D_001/Summary/basecall_1d_template Group
/Analyses/Basecall_1D_002 Group
/Analyses/Basecall_1D_002/BaseCalled_template Group
/Analyses/Basecall_1D_002/BaseCalled_template/Fastq Dataset {SCALAR}
/Analyses/Basecall_1D_002/BaseCalled_template/Move Dataset {2418}
/Analyses/Basecall_1D_002/BaseCalled_template/Trace Dataset {2418, 8}
/Analyses/Basecall_1D_002/Summary Group
/Analyses/Basecall_1D_002/Summary/basecall_1d_template Group
/Analyses/Segmentation_000 Group
/Analyses/Segmentation_000/Summary Group
/Analyses/Segmentation_000/Summary/segmentation Group
/Analyses/Segmentation_001 Group
/Analyses/Segmentation_001/Summary Group
/Analyses/Segmentation_001/Summary/segmentation Group
/Analyses/Segmentation_002 Group
/Analyses/Segmentation_002/Summary Group
/Analyses/Segmentation_002/Summary/segmentation Group
/Raw                     Group
/Raw/Reads               Group
/Raw/Reads/Read_5386     Group
/Raw/Reads/Read_5386/Signal Dataset {5139/Inf}
/UniqueGlobalKey         Group
/UniqueGlobalKey/channel_id Group
/UniqueGlobalKey/context_tags Group
/UniqueGlobalKey/tracking_id Group

It seems that there are several basecalling for this files, and their ids are Basecall_1D_000, Basecall_1D_001 and Basecall_1D_002. Basecall_1D_000 has no move/event information, but Basecall_1D_000 is the default path to find move/event and other information. Thus, you need to use --basecall_1d Basecall_1D_001 or --basecall_1d Basecall_1D_002 to find correct path when run NanoMod.

OferRog commented 3 years ago

@liuqianhn Hi! You are right. basecall_1d_001 has the move/event data. But something new came out when I used '--basecall_1d Basecall_1D_001' command. Thanks for your help! image

liuqianhn commented 3 years ago

@OferRog I do not see the error in your screenshot. The warning should not affect the result. As I mentioned in a previous message, we are improving NanoMod now. The warning will not happen in the improved revision.

OferRog commented 3 years ago

@qian I think it works! Thanks very much for your help!

OferRog commented 3 years ago

@liuqianhn Hi qian! One last question: After using detect command, it showed that cowplot is missing. Does it matter? Because I got the txt file in the end.

image

And this is how txt file looks like. Is this what we expect to see? What is the title for each row? I want to use NanoMod to call out EdU bases. How is EdU look like in this table? Thanks again for your help!

image

liuqianhn commented 3 years ago

@OferRog

  1. 'cowplot' error: this is a R package and is not installed in your R. Without this, you still generate the file in your screenshot, but you will not get any plots for those positions with smallest p-value.
  2. The format of each row is chr_name strand position base coverage_of_positive_controle coverage of negative_control statistic_for_U-test p-value_for_U-test statistic_for_T-test p-value_for_T-test statistic_for_KS-test p-value_for_KS-test statistic_for_Stuffer-combin-test p-value_for_Stuffer-combin-test U-test and T-test are not controlled against sample size. If there is an Edu for a position, you expect a clear peak of p-value (much smaller p-value. The p-value in the screenshot is too large.) at/around that position. Please note that your data is weird simply based on your screenshot, since all positions for both negative/positive control have 5 coverage.
OferRog commented 3 years ago

@liuqianhn Thank you very much! I will install the cowplot and try it again.

liuqianhn commented 3 years ago

If you have used conda to build you virtual environment, you can simply install it by conda install -c biobuilds r-cowplot. Please also make sure you have installed all other dependent packages.

OferRog commented 3 years ago

@liuqianhn Thank you very much! It works!