fanglab / nanodisco

nanodisco: a toolbox for discovering and exploiting multiple types of DNA methylation from individual bacteria and microbiomes using nanopore sequencing.
Other
66 stars 7 forks source link

nanodisco characterize error task 1 failed and RDS file issue #53

Open BioRB opened 1 year ago

BioRB commented 1 year ago

hello, we are running nanodisco and we got this error at the characterize step.

nanodisco characterize -p 4 -b baumani -d analysis/merged_difference/baumani_difference.RDS -o analysis/baumani_motifs -m GATC,CCWGG,GCACNNNNNNGTT,AACNNNNNNGTGC -t nn -r reference_genome/Acinetobacter_baumannii_ATCC_BAA_747.fasta
[2022-09-13 14:11:52] Load supplied current differences.
[2022-09-13 14:11:52] Check current differences file version.
Models for Guppy version 6.3.4+cfaa134 is not yet available but we are working on it.
Motif characterization will still proceed with the default model but obtained results might not be optimal.
Additional information can be found in our GitHub repository.
[2022-09-13 14:11:52] Determine motif signature center.
[2022-09-13 14:11:52]   Process GATC.
[2022-09-13 14:11:52]     Tag GATC occurrences.
[2022-09-13 14:11:55]     Score GATC modified position.
[2022-09-13 14:11:56]   Process CCWGG.
[2022-09-13 14:11:56]     Tag CCWGG occurrences.
[2022-09-13 14:11:57]     Score CCWGG modified position.
[2022-09-13 14:11:57]   Process GCACNNNNNNGTT.
[2022-09-13 14:11:57]     Tag GCACNNNNNNGTT occurrences.
[2022-09-13 14:11:59]     Score GCACNNNNNNGTT modified position.
[2022-09-13 14:11:59]   Process AACNNNNNNGTGC.
[2022-09-13 14:11:59]     Tag AACNNNNNNGTGC occurrences.
[2022-09-13 14:12:01]     Score AACNNNNNNGTGC modified position.
Error in { : 
  task 1 failed - "arguments imply differing number of rows: 1, 0"
Calls: find.signature.center -> %do% -> <Anonymous>
Execution halted

So we had a look to the RDS File generated during the nanodisco difference step. It looks like this:

contig  position    dir strand  N_wga   N_nat   mean_diff   t_test_pval u_test_pval
9a03e25654c44fe8_1  1   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  5001    rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  10001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  15001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  20001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  25001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  30001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  35001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  40001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  45001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  50001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  55001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  60001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  65001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  70001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  75001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  80001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  85001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  90001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  95001   rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  100001  rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  105001  rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  110001  rev t   0   0   NA  NA  NA
9a03e25654c44fe8_1  115001  rev t   0   0   NA  NA  NA

We don't find any current differences detected. This could be a biological issue but what is also strange is that the positions reported are only every 5000 bp and only reverse sequences. Do you have an explanation for this? do we have to set up a parameter in the difference step or upstream to have all the bases covered (1,2,3....). thanks. thanks for your kind help. RB

touala commented 1 year ago

Hello @BioRB,

Thank you for reaching out and sorry for the major delay. I hope you figure out the issue already but if not I'll try to help you sort it out.

The information contained in the difference file suggest that there was no data to process during nanodisco difference step. I would suggest loading the native and WGA .bam file from nanodisco preprocess in IGV to directly look at the data. Without more information, I would think that that the genome used do not match the strain/species sequenced or that your dataset is too shallow. Otherwise, an issue during nanodisco preprocess might have happened.

Feel free to reach back with more questions. I'll be more available in the future.

Best,

Alan