PengNi / ccsmeth

Detecting DNA methylation from PacBio CCS reads
BSD 3-Clause Clear License
72 stars 11 forks source link

Index error during align_reads in denovo mode #46

Open Surbhigrewal opened 11 months ago

Surbhigrewal commented 11 months ago

Hi there,

I'm getting the below error after the call_mods has generated the bam file and I use it to align to ref. It generated the pbmm2.bam file but could not create the BAI index.

2023-12-06 17:30:58 - INFO - stdout:

2023-12-06 17:30:58 - INFO - stderr:
>|> 20231206 16:33:04.637 -|- WARN -|- AlignSettings -|- 0x2b9fefede340|| -|- Requested more threads for alignment 
(160) than system-wide available (40)
>|> 20231206 16:33:04.689 -|- WARN -|- operator() -|- 0x2b9fefede340|| -|- Input is aligned reads. Only primary 
alignments will be respected to allow idempotence!
[E::hts_idx_check_range] Region 536851006..536880338 cannot be stored in a bai index. Try using a csi index
[E::sam_index] Read 'm64164_220611_190014/99878424/ccs' with ref_name='Chr1At', ref_length=614431332, 
flags=0, pos=536851007 cannot be indexed
>|> 20231206 17:30:58.187 -|- FATAL -|- Close -|- 0x2b9fefede340|| -|- pbmm2 align ERROR: BAI Index Generation: 
Failed to create index for output.hifi.call_mods.modbam.pbmm2.bam

I tried to run pbmm2 directly using this code which allows for a CSI index to be created by pbmm2 and samtools

pbmm2 align --preset CCS -j 40 --sort  --bam-index CSI ref.fasta output.hifi.call_mods.modbam.bam 
output.hifi.call_mods.modbam.pbmm2.bam && samtools index -@ 40 -c output.hifi.call_mods.modbam.pbmm2.bam

I proceeded to the call modification freq step using the aggregate mode with the above bam and index files and got the below error which means the CSI index file is not being recognised and samtools tries to create a BAI index again and fails.

2023-12-06 23:40:49 - INFO - [main]call_freq_bam starts
2023-12-06 23:40:49 - INFO - indexing bam file-output.call_mods.modbam.pbmm2.bam

[E::hts_idx_check_range] Region 536851006..536880338 cannot be stored in a bai index. Try using a csi index
[E::sam_index] Read 'm64164_220611_190014/99878424/ccs' with ref_name='Chr1At', ref_length=614431332, flags=0, pos=536851007 cannot be indexed
Traceback (most recent call last):
  File "//miniconda3/envs/ccsmeth/bin/ccsmeth", line 10, in <module>
    sys.exit(main())
  File "/miniconda3/envs/ccsmeth/lib/python3.8/site-packages/ccsmeth/ccsmeth.py", line 795, in main
    args.func(args)
  File "/miniconda3/envs/ccsmeth/lib/python3.8/site-packages/ccsmeth/ccsmeth.py", line 44, in main_call_freqb
    call_mods_frequency_from_bamfile(args)
  File "/miniconda3/envs/ccsmeth/lib/python3.8/site-packages/ccsmeth/call_mods_freq_bam.py", line 688, in call_mods_frequency_from_bamfile
    index_bam_if_needed2(inputpath, args.threads)
  File "/miniconda3/envs/ccsmeth/lib/python3.8/site-packages/ccsmeth/utils/process_utils.py", line 280, in index_bam_if_needed2
    pysam.index("-@", str(threads), inputfile)
  File "/miniconda3/envs/ccsmeth/lib/python3.8/site-packages/pysam/utils.py", line 83, in __call__
    raise SamtoolsError(
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "output.hifi.call_mods.modbam.pbmm2.bam": Numerical result out of range\n'

What is the workaround for this please?

PengNi commented 11 months ago

Hi @Surbhigrewal , thank you very much for reporting this. I didn't consider this kind of case that the size of a chromosome is >512MB before. I have updated the code, please try the latest version of ccsmeth.

Best, Peng

Surbhigrewal commented 11 months ago

I got around the problem by renaming my .csi index as .bai. This started generating a bed file but the process is quite slow. I'm wondering if it is due to the incorrect type of index file. But thank you for sorting out the indexing and updating the software.