HKU-BAL / ClairS-TO

ClairS-TO - a deep-learning method for tumor-only somatic variant calling
BSD 3-Clause "New" or "Revised" License
37 stars 3 forks source link

IndexError during STEP1 #5

Closed ChangqingW closed 5 months ago

ChangqingW commented 5 months ago

I got heaps of IndexError while running clairs_to with the latest docker image on Singularity:

Traceback (most recent call last):
  File "/opt/bin/clairs_to.py", line 107, in <module>
    main()
  File "/opt/bin/clairs_to.py", line 101, in main
    submodule.main()
  File "/opt/bin/src/extract_candidates_calling.py", line 597, in main
    extract_pair_candidates(args)
  File "/opt/bin/src/extract_candidates_calling.py", line 341, in extract_pair_candidates
    select_indel_candidates=select_indel_candidates
  File "/opt/bin/src/extract_candidates_calling.py", line 91, in decode_pileup_bases
    base_list[-1][1] = base + pileup_bases[base_idx: base_idx + advance]  # add indel seq
IndexError: list index out of range

I wonder if this is expected, or am I actually losing these candidate indels due to the error.

JasonCLEI commented 5 months ago

Hi, @ChangqingW,

Thanks a lot for your interest. It seems that the error was caused by the input data with an inappropriate data format. Could you please provide some sample data of yours (if possible, sending to my email address lchen@cs.hku.hk) and your platform option? Particularly, ClairS-TO currently accepts DNA data as input, and other data like RNA cannot be processed. In addition, we will do some data format checks in our next release.

Lei

ChangqingW commented 5 months ago

The inputs were BAM files from aligning RNA-seq to the reference genome using minimap2 with splice options, I guess it must be the splicing then. (I wonder if I can get around with it by aligning the reads to the transcriptome instead the genome.) Do you have any recommend tool for calling somatic indels and SNVs in Nanopore RNA-seq by any chance?

Thanks for the swift reply.

JasonCLEI commented 5 months ago

Hi, @ChangqingW,

ClairS-TO currently was trained on DNA data and could be used to process DNA data on different platforms. Even if we can adapt the input of RNA data in ClairS-TO, we cannot guarantee its effectiveness. As far as I know, there are no tools for calling somatic INDELs and SNVs in Nanopore RNA-seq at the moment, mainly due to the lack of high-quality training data. By the way, there have been some studies on Nanopore RNA-seq variant caller in our lab, and you are welcome to pay attention to our follow-up research.

Lei

ChangqingW commented 5 months ago

Thanks for the explanation, I guess I will have to stick with simply counting mutations in pileups. Looking forward for exciting new work you mentioned (hopefully it will include somatic mutations?).