fanglab / nanodisco

nanodisco: a toolbox for discovering and exploiting multiple types of DNA methylation from individual bacteria and microbiomes using nanopore sequencing.
Other
66 stars 7 forks source link

Nanodisco difference: no signal difference for majority of a genome #68

Open christinehe opened 1 year ago

christinehe commented 1 year ago

Hi,

I'm running nanodisco against a curated genome sequence known to be present in the sample (from Illumina data and assembly of the ONT data). The alignments from nanodisco preprocess show reasonably convincing read support across the genome.

However, nanodisco is unable to find any motifs. Over 70% of the positions in the merged signal differences file have no current values. It seems odd that a current difference would be found at one position, with no current values for the neighboring positions:

"contig" "position" "dir" "strand" "N_wga" "N_nat" "mean_diff" "t_test_pval" "u_test_pval"
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2244 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2244 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2245 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2245 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2246 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2246 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2247 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2247 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2248 "fwd" "t" 11 16 13.7768837637467 0.000430954878020719 6.13595983093897e-07
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2248 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2249 "fwd" "t" 6 22 1.68108663813591 0.0982472566548756 0.0995487604183256
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2249 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2250 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2250 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2251 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2251 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2252 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2252 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2253 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2253 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2254 "fwd" "t" 6 15 0.320795190325512 0.791074140784466 0.62218045112782
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2254 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2255 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2255 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2256 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2256 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2257 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2257 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2258 "fwd" "t" 5 23 3.37158443622192 0.00454686242416758 0.00400895400895401
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2258 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2259 "fwd" "t" 13 10 -1.6600018896479 0.0614309265440312 0.101011654922006
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2259 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2260 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2260 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2261 "fwd" "t" 5 11 10.6930921222628 0.000372580357190029 0.0086996336996337
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2261 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2262 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2262 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2263 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2263 "rev" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2264 "fwd" "t" 0 0 NA NA NA
"SRVP_Atabeyarchaeota_1_curated.FINAL" 2264 "rev" "t" 0 0 NA NA NA

Any advice is appreciated! Happy to provide more info if helpful.

touala commented 1 year ago

Hi @christinehe,

Thank you for trying nanodisco. This is indeed surprising to have partial data. What is the general coverage or coverage distribution from native and WGA files? I've implemented a minimum coverage threshold at 5x that might have kicked in and stopped it from reporting lower confidence statistics.

Best,

Alan

christinehe commented 1 year ago

Thanks @touala, this makes sense as the mean coverage depth in the WGA sample is unfortunately only 1.7x. Knowing the caveats, I'd still like to try running nanodisco with a lower threshold. I edited the threshold in difference.sh and am running in a Singularity sandbox. Any other recommendations for how best to change this threshold?

touala commented 1 year ago

You're welcome. To be clear, we do not recommend lowering the min coverage threshold in general as it will results in accuracy loss. Adding WGA data will always be the best solution. But feel free to experiment with this threshold. BTW you can use the -e options in nanodisco difference to modify coverage requirement (see here).

Please let me know how it goes.

Alan

christinehe commented 1 year ago

Unfortunately lowering the coverage threshold did not help. Thanks for your responsiveness - I'll see if generating more WGA data is an option.

touala commented 1 year ago

It was indeed a long shot, but thanks for the feedback. If you're going to generate more data, native would also benefit from increased coverage assuming the 10-25x reported above is global. In Extended Data Fig. 7 from our paper, you can see the positive effect of increased coverage on the analysis (up to 200x).

Alan