fanglab / nanodisco

nanodisco: a toolbox for discovering and exploiting multiple types of DNA methylation from individual bacteria and microbiomes using nanopore sequencing.

Other

66 stars 7 forks source link

Unavailable to run personal metadate via Nanodisco #12

Closed BurinkiWU closed 2 years ago

BurinkiWU commented 3 years ago

To whom it concerns,

I have met a cituation as following:

wzq@nanodisco:~$ nanodisco preprocess -p 40 -f dataset/seameta_data/20180609_1025_sea_meta_9_wzq/ -s seameta_9 -o analysis/preprocessed_subet -r reference/seameta_9.fasta [2020-11-23 09:23:04] Extract sequences from fast5. Warning message: In extract.sequence(path_input, base_name, path_output, nb_threads, : 448214 reads weren't basecalled. No reads were extracted. Please check that -f/--path_fast5 is correct.

Is it because of the basecalled software problem?

Best

touala commented 3 years ago

Hello @BurinkiWU,

Thank you for your interest in our work and sorry for the delay. From this error message, it seems that the issue is indeed related to the basecalling of the supplied fast5. I see at least two possible explanations: the basecalling failed or the basecalling software produce results not recognized by nanodisco.

It would be good to know if the basecalling ended successfully and what basecalling software and version was used?

Regards,

Alan

BurinkiWU commented 3 years ago

Hi @touala , Much of appreciate! I have fixed the problems. The ".fast5" files should be basecalled via guppy or albacore from the raw ".fast5" files. If I used the raw fast5 dataset, it causes this problem. Sorry to take up your time again. If I want analysis my metagenomic nanopore data, what is the optimum reference for the following steps: nanodisco preprocess -p 4 -f dataset/fast5 -s EC_WGA -o analysis/preprocessed_subset -r reference/metagenomic_data Shall I use my own nanopore metagenomic sequencing data as reference? Or any other reference to be replaced to it? Best, Bruce

touala commented 3 years ago

Hi @BurinkiWU,

That's great that you found the reason for the original issue.

For both the analysis of individual bacteria and the microbiome, you can use assemblies made from nanopore data only. Using data from nanopore sequencing of a WGA sample is not necessary, however, some precautions need to be taken to correct potential methylation-related assembly errors. Most importantly you need to polish your de novo assembly using non-native reads, whether they're from a WGA nanopore sequencing run or from an Illumina/PacBio dataset. We had great results for polishing with the following pipeline but other tools/combination could also work:

Run Flye or Canu assembly with native nanopore reads.
Run Racon with native nanopore reads (for 4 times).
Run Nanopolish with native nanopore reads (for 5 times).
Run Nanopolish with WGA nanopore reads (for 5 times).

If you already have a reference (meta)genome generated from another experiment on the same sample (e.g. PacBio sequencing), it could be used directly too.

Regards,

Alan

hdore commented 3 years ago

Hello @touala,

Just for clarification, based on your above message I understand that it is not possible to use Nanodisco (in my case on a metagenome) without having sequenced the same sample either with a WGA approach or another technology (PacBio or Illumina). Is that correct? It means that a single Nanopore run is not sufficient.

If I understand well, the problem is that you need sequences on non-native DNA to make comparisons. So if we want to stick to Nanopore sequencing, we could use any non-native sequencing strategy, either WGA or PCR-amplified DNA (such as the Nanopore PCR barcoding kit). Is that correct?

Thank you for your help, Regards,

Hugo

touala commented 3 years ago

Hello Hugo,

Yes, your understanding is correct: both native and WGA datasets are needed to perform an analysis with nanodisco. We found this data requirement provides reliable detection (please see rationale in Q4 on our FAQ page: https://nanodisco.readthedocs.io/en/latest/faq.html#q-wga).

While both WGA and PCR can create non-modified DNA, our own experience has been with WGA so far (using the protocol we described in our preprint, more than 10 ug of amplified DNA is usually generated from 12.5 ng of native DNA, so the vast majority of DNA are non-modified).

In case you choose to use a Nanopore PCR barcoding kit, it would be great if you could let us know what your results are. Feel free to contact us again if you have any additional questions.

Regards,

Alan

BurinkiWU commented 3 years ago

Hi @hdore ,

Thanks for your consideration. You got my point. I also met the same sititution that I only have the native Nanopore sequencing data but not the non-native and non-methylased nanopore sequencing data. So it is impossible to follow the steps to run nanodisco binning.

I am tring to use whole genome amplication (WGA) to amplify metagenomic sample and then use Nanopore to sequence it again in order to set it up as the non-methylased reference.

Let c what happen later.

Best,

Bruce