PengNi / ccsmeth

Detecting DNA methylation from PacBio CCS reads
BSD 3-Clause Clear License
71 stars 10 forks source link

IndexError while running ccsmeth call_mods #48

Open suvi93 opened 8 months ago

suvi93 commented 8 months ago

I'm getting the following error and after this the job seems to be idling. it does not exit out and give me an error but instead keeps running without producing an output. the error is below -

=============================================== 2024-01-10 13:37:00 - INFO - [main]call_mods starts 2024-01-10 13:37:00 - INFO - cuda availability: True 2024-01-10 13:37:02 - INFO - format_features process-10612 starts 2024-01-10 13:37:02 - INFO - call_mods process-10615 starts 2024-01-10 13:37:02 - INFO - write_process-10617 starts 2024-01-10 13:37:02 - INFO - read_features process-10602 starts 2024-01-10 13:37:02 - INFO - format_features process-10608 starts 2024-01-10 13:37:02 - INFO - format_features process-10607 starts 2024-01-10 13:37:02 - INFO - call_mods process-10616 starts Process Process-1: Traceback (most recent call last): File "/crex/proj/snic2021-6-151/nobackup/Suvi/miniconda3/envs/ccsmethenv/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/crex/proj/snic2021-6-151/nobackup/Suvi/miniconda3/envs/ccsmethenv/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/crex/proj/snic2021-6-151/nobackup/Suvi/miniconda3/envs/ccsmethenv/lib/python3.8/site-packages/ccsmeth/_call_modifications_txt.py", line 74, in _read_features_file_to_str h_num_total = _count_holenum(features_file) File "/crex/proj/snic2021-6-151/nobackup/Suvi/miniconda3/envs/ccsmethenv/lib/python3.8/site-packages/ccsmeth/_call_modifications_txt.py", line 61, in _count_holenum holeid = words[3] IndexError: list index out of range 2024-01-10 13:37:02 - INFO - format_features process-10614 starts 2024-01-10 13:37:02 - INFO - format_features process-10603 starts 2024-01-10 13:37:02 - INFO - format_features process-10609 starts 2024-01-10 13:37:02 - INFO - format_features process-10606 starts 2024-01-10 13:37:02 - INFO - format_features process-10613 starts 2024-01-10 13:37:02 - INFO - format_features process-10604 starts 2024-01-10 13:37:02 - INFO - format_features process-10605 starts 2024-01-10 13:37:02 - INFO - format_features process-10611 starts

how do i go about fixing the index error as it says?

thanks, Suvi

PengNi commented 8 months ago

Hi Suvi, could you tell me what your input file and the command you use are? It seems that the input file doesn't match the command.

Best, Peng

suvi93 commented 8 months ago

Hi Peng,

here is the command i used -

CUDA_VISIBLE_DEVICES=0,1 ccsmeth call_mods \ --input hifi-kinetics_out/Vv19_2.fofn \ #file containing hifi kinetics bam files --ref ref/Vv19/vv19.fa \ --model_file ccsmeth/models/model_ccsmeth_5mCpG_call_mods_attbigru2s_b21.v2.ckpt \ --output Vv19.hifi.modbam.bam \ --threads 15 --threads_call 2 --model_type attbigru2s \ --mode align

PengNi commented 8 months ago

Hi @suvi93 , the --input param should be a bam file, not the fofn file. Please try again, if there are any issues, please feel free to concat me.

Best, Peng

suvi93 commented 8 months ago

Hi Peng,

so i have 5 different kinetics bam file for the same sample, so if i process them individually can i concatenate all the results of the pipeline at the end?

Thanks, Suvi

PengNi commented 8 months ago

Yes Suvi, you can use samtools to merge all the output bam files.

suvi93 commented 8 months ago

Hi Peng,

is there a way to change the indexing of the bam files from samtools index to bamtools index or to split the bam file and then run the indexing? my bam file is too huge to be indexed with samtools index and this is the error i get -

2024-02-02 11:28:34 - INFO - [main]align_hifi_reads starts 2024-02-02 11:28:34 - INFO - cmds: pbmm2 align --preset CCS -j 15 --sort vv19.fa m64077_220928_112919.hifi.modbam.bam m64077_220928_112919.hifi.call_mods.modbam.pbmm2.bam && samtools index -@ 15 m64077_220928_112919.hifi.call_mods.modbam.pbmm2.bam 2024-02-02 14:32:58 - WARNING - failed.. 2024-02-02 14:32:59 - INFO - stdout:

2024-02-02 14:32:59 - INFO - stderr: |> 20240202 10:28:34.527 -|- WARN -|- operator() -|- 0x2b19e1f6c340|| -|- Input is aligned reads. Only primary alignments will be respected to allow idempotence! [E::hts_idx_check_range] Region 536859611..536871025 cannot be stored in a bai index. Try using a csi index [E::sam_index] Read 'm64077_220928_112919/142936459/ccs' with ref_name='scaffold_1', ref_length=591107699, flags=16, pos=536859612 cannot be indexed |> 20240202 13:32:54.623 -|- FATAL -|- Close -|- 0x2b19e1f6c340|| -|- pbmm2 align ERROR: BAI Index Generation: Failed to create index for m64077_220928_112919.hifi.call_mods.modbam.pbmm2.bam

PengNi commented 8 months ago

Hi @suvi93 , I am not sure, but maybe you can try samtools index with --csi option.

Best, Peng

suvi93 commented 8 months ago

in which script do I add the --csi option?
sorry, that part was a bit unclear as nowhere in the command for align_hifi am I mentioning this -

ccsmeth align_hifi --hifireads m64077_220928_112919.hifi.modbam.bam --ref vv19.fa --output m64077_220928_112919.hifi.call_mods.modbam.pbmm2.bam --threads 15

zihaozhu92 commented 6 months ago

Hi PengNi, I encountered a similar indexing issue. My bam file cannot be bai indexed. I used samtools index -c for csi index. However, ccsmeth call_mods still starts with bai indexing step despite the existing csi index. How to skip this step? Thanks

PengNi commented 6 months ago

Hi PengNi, I encountered a similar indexing issue. My bam file cannot be bai indexed. I used samtools index -c for csi index. However, ccsmeth call_mods still starts with bai indexing step despite the existing csi index. How to skip this step? Thanks

Hi @zihaozhu92 , did you use the latest version of ccsmeth?