fanglab / nanodisco

nanodisco: a toolbox for discovering and exploiting multiple types of DNA methylation from individual bacteria and microbiomes using nanopore sequencing.
Other
66 stars 7 forks source link

nanodisco motif p value #26

Closed gluspaula closed 2 years ago

gluspaula commented 2 years ago

Hi,

I'm seeking help in this portion of the analysis: "motif: De novo discovery of methylation motifs from current differences file." I'm getting an error in the statistical portion of the analysis, I think I have no p-values or there are no differences in methylation for my two sequences. Could it be the problem that I have no methylation sites if I come to this error or could I have made an error before and the p-values were lost in the process? Sorry if this is confusing but I'd appreciate any insight on this issue Thank you

PS: I cannot confirm there are or aren't p-values in the RDS

Portion of the script from the nanodisco documents where I'm stuck: (Error is below) nanodisco motif -p -b -d -o -r [+ advanced parameters] -p : Number of threads to use. -b : Base name for outputting results (e.g. Ecoli_K12). Default is 'results'. -d : Path to current differences file (*.RDS produced from nanodisco difference). -o : Path to output directory. Default is current directory. -r : Path to a reference genome (i.e. fasta). -h : Print help.

Error:

[2021-08-30 13:16:35] Prepare output folder. [2021-08-30 13:16:35] Load supplied current differences. [2021-08-30 13:16:35] Detect motifs. [2021-08-30 13:16:35] Processing statistical signal. Error in eval(e, x, parent.frame()) : object 'p' not found Calls: wrapper.motif.detection ... subset -> subset -> subset.data.frame -> eval -> eval Execution halted

touala commented 2 years ago

Hello @gluspaula,

Thank you for reaching out. You shouldn't see this type of errors, even if your sample is unmethylated. I think something went wrong when generating the __ file but we would need access to this .RDS file to sort this out.

If possible, could you share the .RDS file and the .fasta file (at alan.tourancheau@bio.ens.psl.eu) so I can try reproducing the error on my side? Otherwise, I can provide you with commands to try debugging the issue "remotely".

Regards,

Alan

touala commented 2 years ago

After a private discussion, we conclude that the issue is related to the latest change in fast5 format which now can use vbz compression instead of gzip. You can check the format of your fast5 file with the following command:

if h5dump -pH <path_fast5_file> | grep -q vbz; then
  echo "It is VBZ-compressed"
else
  echo "It is not VBZ-compressed"
fi

I'm working on the next nanodisco update which will handle this format out of the box. Meanwhile, if the fast5 files are vbz compressed, you can use ONT tool with the compress_fast5 command to convert them back to gzip:

compress_fast5 -t 10 --recursive -i <vbz_fast5_dir> -s <vbz_fast5_dir>"_gzip" -c gzip