PengNi / ccsmeth

Detecting DNA methylation from PacBio CCS reads
BSD 3-Clause Clear License
72 stars 11 forks source link

All reads are skipped/failed #38

Open dnalinkbi opened 1 year ago

dnalinkbi commented 1 year ago

Hi PengNi! I recently installed the ccsmeth and encountered a problem when using 'call_mods'.

The command I used is CUDA_VISIBLE_DEVICES=0 ccsmeth call_mods --input test.hifi_reads.aligned.bam --ref reference.fasta --model_file model_ccsmeth_5mCpG_call_mods_attbigru2s_b21.v2.ckpt --output test.hifi_reads.aligned.call_mods --threads 60 --threads_call 4 --model_type attbigru2s --mode align

The program exits without error, but the result(modbam.bam) is empty. Looking at the log, I found the following problem:

2023-07-20 14:48:25 - INFO - extract_features process-47563 ending, proceed 2040 hole_batches(50): 102000 holes/reads in total, 102000 skipped/failed. 2023-07-20 14:48:25 - INFO - extract_features process-47286 ending, proceed 2079 hole_batches(50): 103950 holes/reads in total, 103950 skipped/failed. ... 2023-07-20 14:48:26 - INFO - extract_features process-45955 ending, proceed 2199 hole_batches(50): 109950 holes/reads in total, 109950 skipped/failed. 2023-07-20 14:48:26 - INFO - call_mods process-42482 ending, proceed 0 batches(512) 2023-07-20 14:48:26 - INFO - call_mods process-41961 ending, proceed 0 batches(512) 2023-07-20 14:48:26 - INFO - wrote 0 reads, in which 0 were added mm tags

0 reads were written because all reads were skipped or failed. I cannot figure out why. Could you give me some hints?

PengNi commented 1 year ago

Hi @dnalinkbi , how do you get the hifi_reads.bam file? Are there kinetics signals for the reads in the hifi.bam?

dnalinkbi commented 1 year ago

The hifi reads.bam file is pacbio (Revio) raw data. The original name was 'm84065_230512_023251_s1.hifi_reads.default.bam'. An aligned bam file was obtained through ccsmeth align_hifi, and ccsmeth call_mods was executed with the aligned bam file. Do we need a separate experimental method for modification call?

PengNi commented 1 year ago

The default Revio hifi.bam may have no kinetics signals (which are fi, fp, ri, rp, fn, rn tags in each segment in the bam file). Without kinetics signals, the reads cannot be processed by ccsmeth for methylation calling.

However, there may be MM, ML tags in the hifi.bam file, which are the results of the PacBio official methylation caller primrose.

dnalinkbi commented 1 year ago

Thank you for quick response. What we're trying to do now is to find out how much better using 'ccsmeth' than 'primrose'.

We need to use subreads.bam to get the kinetics signal (fi, fp, ri, rp, fn, rn tags in each segment of the bam file). Am i right?