fanglab / nanodisco

nanodisco: a toolbox for discovering and exploiting multiple types of DNA methylation from individual bacteria and microbiomes using nanopore sequencing.
Other
66 stars 7 forks source link

Column `motif` is unknown #9

Closed LynnLy closed 2 years ago

LynnLy commented 3 years ago

Hi there, I ran into an error during nanodisco profile with -a all. I am using the docker installation and I redownloaded list_motifs.RDS as mentioned in #8 .

Appreciate the help! Lynn

root@nanodisco:~$ nanodisco profile -p 15 -r reference/6plex.fasta -d analysis/Mix2_6plex_difference.RDS -w analysis/preprocessed/Mix2_wga_subsample.cov -n analysis/preprocessed/Mix2_native_subsample.cov -b Mix2_6plex_all -o analysis/binning -a all

Methylation profile are computed for all predefined common motifs (n=210,176) on long contigs only (>=100000 bp). This can take a while (>24h).
[2020-09-04 19:45:16] Read list of common motifs.
[2020-09-04 19:45:16] Prepare default metagenome annotation.
[2020-09-04 19:45:17] Load supplied current differences.
[2020-09-04 19:45:32] Load contigs coverage information.
[2020-09-04 19:45:32] Prepare subset of contigs (>=100000 bp).
[2020-09-04 19:45:33] Compute methylation features on subset of contigs.
[2020-09-04 19:45:33]     Initialize methylation feature computation.
[2020-09-04 19:45:50]     Processing motifs.
 Motifs processed (210165/210176): [======>] 100% eta:  4s (elapsed: 20:13:17)Error in { : task 136782 failed - "Column `motif` is unknown"
Calls: score.metagenome.motifs -> %dopar% -> <Anonymous>
touala commented 3 years ago

Hi Lynn,

Thank you for reporting this issue. It seems to be similar to a problem I encountered during early nanodisco development which was due to running out of memory on a computing note with limited resources, specifically during the computation of methylation features from motif in list_motifs.RDS (in your case of the 136,782nd motif).

Without access to the original datasets, this is probably difficult to reproduce but I would recommend you to rerun the command while monitoring the memory usage if possible. Checking if the issue happened at the same iteration would be very informative. If you observe the same error, I can send you some additional code to attempt to identify the exact issue. Alternatively, if you hope to privately share your datasets, I am happy to help locally dissect the issue and look for a fix if necessary.

Alan

LynnLy commented 3 years ago

Thanks. I subset list_motifs.RDS to just the problematic motif, and it is reproducibly breaking at that same motif. However, the memory usage was only ~20GB out of 480GB.

I'm happy to email you a link to the dataset. What email address is best?

touala commented 3 years ago

Thank you for further diagnosing the issue. I am happy to help you by looking into your dataset.

I would at least need the reference (6plex.fasta) and difference files (Mix2_6plex_difference.RDS), but the coverage could be helpful, too. You can send me the link at: alan.tourancheau@bio.ens.psl.eu.

Alan

touala commented 3 years ago

Hi Lynn,

As discussed by email, we have implemented a fix to avoid this issue. It was due to the motif of interest not having any occurrences across all contigs in the metagenome. The fix is now included in nanodisco v1.0.1.

Once again, please let us know if you encounter any issue. We are happy to help.

Alan