PengNi / ccsmeth

Detecting DNA methylation from PacBio CCS reads
BSD 3-Clause Clear License
71 stars 10 forks source link

ccsmeth call_mods: indexError and TypeError #40

Closed dnalinkbi closed 10 months ago

dnalinkbi commented 1 year ago

Hi! We performed ccsmeth through the data included in the paper and confirmed that the accuracy was improved over primrose.

Now we are testing ccsmeth with data from REVIO. After aligning with the hifi_reads.bam file containing the kinetics signal, call_mods was performed. The modbam.bam file was created, but the following error was displayed in the log. What could be the problem?

call_mods_error

PengNi commented 1 year ago

Hi @dnalinkbi , thanks for using ccemth. According to the logs, there may exist some IPD values greater than 255 in the data, which is illegal. I'll check and try to fix this in a few days. Can you send me some of your data for debugging, like 1-10k reads?

Best, Peng

dnalinkbi commented 1 year ago

OK! Then, what type of bam file do you need? Hifi_reads bam or aligned bam file?

PengNi commented 1 year ago

Thank you @dnalinkbi , hifi_reads.bam is OK.

dnalinkbi commented 1 year ago

We will share test.bam(hifi_reads.bam) on Google Drive. We'll add your mail account (543943952@qq.com) as a share target.

test.bam

PengNi commented 1 year ago

Thanks, my google account is npzzh0901@gmail.com, I have applied to access the data.

dnalinkbi commented 1 year ago

I have shared with that account.

PengNi commented 1 year ago

Hi @dnalinkbi , it seems that the BAM files generated by REVIO use raw kinetics values (Ipd and PulseWidth) instead of encoding them. To use ccsmeth for data from REVIO, please add --no_decode in the call_mods command. For more information, please check the 'Encoding of kinetics pulse features' part of https://pacbiofileformats.readthedocs.io/en/11.0/BAM.html.

Here are the headers of the BAM files from Sequel II and REVIO:

# sequel II
@RG     ID:c92269b2     SM:UnnamedSample        PU:m64012_190920_173625 DS:READTYPE=CCS;Ipd:CodecV1=ip;PulseWidth:CodecV1=pw;BINDINGKIT=101-789-500;SEQUENCINGKIT=101-826-100;BASECALLERVERSION=5.0.0;FRAMERATEHZ=100.000000    PL:PACBIO
       PM:SEQUELII

# REVIO
@RG     ID:test PL:PACBIO       DS:READTYPE=CCS;Ipd:Frames=ip;PulseWidth:Frames=pw;BINDINGKIT=102-739-100;SEQUENCINGKIT=102-118-800;BASECALLERVERSION=5.0;FRAMERATEHZ=100.000000;BarcodeFile=metadata/m84065_230803_093004_s4.barcodes.fasta
;BarcodeHash=e7c4279103df8c8de7036efdbdca9008;BarcodeCount=113;BarcodeMode=Symmetric;BarcodeQuality=Score   LB:test PU:m84065_230803_093004_s4      SM:test PM:REVIO        BC:CATGTATGTCGAGTAT     CM:R/P1-C1/5.0-25M
dnalinkbi commented 1 year ago

Thanks for your reply. I'll check the result by adding that option.

PengNi commented 10 months ago

Closing this issue as it is inactive. If further issues arise, feel free to reopen this issue .