PacificBiosciences / pb-CpG-tools

Collection of tools for the analysis of CpG data
BSD 3-Clause Clear License
70 stars 6 forks source link

running error #42

Closed yueqitaoo closed 1 year ago

yueqitaoo commented 1 year ago

Hi,

I submit the script for several samples, some run successfully finished, while some have error below. Could you help me to solve this? Thank you!

my script: software/pb-CpG-tools/aligned_bam_to_cpg_scores.py -b temp1.aligned.bam -f cluster2.fa -o cluster2 -d /software/pb-CpG-tools/pileup_calling_model -c 1 -t 30

error report: Exception thrown in worker process 65618: Exception thrown while processing read m64079_220122_045853/1508001/ccs: Base modification offsets in MM tag for modification type 'C+m' are inconsistent with read length

concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/software/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 526, in pileup_from_reads process_read(ref, pos_start, pos_stop, hap_tag, is_denovo_modsites, pileup_data, read) File "/software/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 458, in process_read mod_dict = get_mod_dict(read.query_sequence, mmtag, 'C+m', 'C', mltag, is_reverse) File "/software/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 351, in get_mod_dict mod_base_indices = parse_mmtag(query_seq, mmtag, modcode, base, reverse) File "/software/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 316, in parse_mmtag raise Exception( Exception: Base modification offsets in MM tag for modification type 'C+m' are inconsistent with read length

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/miniconda3/envs/cpg/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/software/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 895, in run_process_region_wrapper return run_process_region(arguments) File "/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 869, in run_process_region basemod_data, cg_sites_read_set = pileup_from_reads(bamIn, ref, pos_start, pos_stop, min_mapq, hap_tag, modsites) File "/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 528, in pileup_from_reads raise Exception("Exception thrown while processing read {}: {}\n".format(read.query_name, e)) Exception: Exception thrown while processing read m64079_220122_045853/1508001/ccs: Base modification offsets in MM tag for modification type 'C+m' are inconsistent with read length

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/software/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 1164, in main() File "/software/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 1152, in main bed_results = run_all_pileup_processing(regions_to_process, args.threads) File "/software/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 927, in run_all_pileup_processing bed_result = future.result() File "/miniconda3/envs/cpg/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.get_result() File "/miniconda3/envs/cpg/lib/python3.9/concurrent/futures/_base.py", line 391, in get_result raise self._exception Exception: Exception thrown while processing read m64079_220122_045853/1508001/ccs: Base modification offsets in MM tag for modification type 'C+m' are inconsistent with read length

ctsa commented 1 year ago

Hi, The error message "Base modification offsets in MM tag for modification type 'C+m' are inconsistent with read length" indicates that the MM/ML tags are not consistent with the reads in this BAM file.

One way this might occur is if a tool has shorted the reads in the bam file, without making the corresponding updates to MM and ML tags, for instance if it has been through adaptor trimming. Does this sound like it might apply to your sample?

yueqitaoo commented 1 year ago

Hi, Thank you for answer! Yes, I think it's my case. Since my data is multiplex, I use lima to split by barcode first. Looks like the latest version lima can fix it. I will try.

ctsa commented 1 year ago

Sounds good @yueqitaoo. It is correct that Lima had to be upgraded to address this issue.

While you're in the process of updating your analysis, please consider changing to the latest 2.1.1 release of pb-CpG-tools. The python script has been replaced with a compiled linux binary that is substantially faster, please see the updated readme on our front page for details:

https://github.com/PacificBiosciences/pb-CpG-tools

yueqitaoo commented 1 year ago

Many thanks! @ctsa

ctsa commented 1 year ago

Closing as resolved.