Only 1 somatic variant returned

Jiayi-Wang-Joey commented 1 month ago

Hi Thanks for the great work! I tried to call somatic variant using clairS, but the output vcf only returned 1 variant on Y chromosome.. My datasets are PacBio long-read scRNA-seq with a bit low coverage ~10x. However, I still expect much more than 1 somatic variants, because I already observed some using IGV. And clair3_output returns more variants in tumor than normal.

This is the command I used: singularity exec -B ${INPUT_DIR},${OUTPUT_DIR} clairs-latest.simg /opt/bin/run_clairs --tumor_bam_fn ${INPUT_DIR}/tumor1.aligned.sorted.bam --normal_bam_fn ${INPUT_DIR}/normal1.aligned.sorted.bam --ref_fn ~/pacbio/data/genome/GRCh38.primary_assembly.genome.fa --threads 10 --platform hifi_revio --output_dir ${OUTPUT_DIR} --conda_prefix /opt/conda/envs/clairs --use_heterozygous_snp_in_tumor_sample_for_intermediate_phasing TRUE -q 8 --snv_min_af 0.01

In the log file, I found several errors like:

[INFO] chr1 chunk 26/50: Total candidates found: 0
Traceback (most recent call last):
  File "/opt/bin/clairs.py", line 123, in <module>
    main()
  File "/opt/bin/clairs.py", line 117, in main
    submodule.main()
  File "/opt/bin/src/extract_pair_candidates.py", line 705, in main
    extract_pair_candidates(args)
  File "/opt/bin/src/extract_pair_candidates.py", line 346, in extract_pair_candidates
    select_indel_candidates=select_indel_candidates
  File "/opt/bin/src/extract_pair_candidates.py", line 91, in decode_pileup_bases
    base_list[-1][1] = base + pileup_bases[base_idx: base_idx + advance]  # add indel seq
IndexError: list index out of range
Traceback (most recent call last):
  File "/opt/bin/clairs.py", line 123, in <module>
    main()
  File "/opt/bin/clairs.py", line 117, in main
    submodule.main()
  File "/opt/bin/src/extract_pair_candidates.py", line 705, in main
    extract_pair_candidates(args)
  File "/opt/bin/src/extract_pair_candidates.py", line 346, in extract_pair_candidates
    select_indel_candidates=select_indel_candidates
  File "/opt/bin/src/extract_pair_candidates.py", line 91, in decode_pileup_bases
    base_list[-1][1] = base + pileup_bases[base_idx: base_idx + advance]  # add indel seq
IndexError: list index out of range

Thanks in advance for any help!

zhengzhenxian commented 1 month ago

@Jiayi-Wang-Joey

ClairS is designed for DNA-seq and not optimized for scRNA-seq data. The pipeline is broken as some extra steps are required to handle the splicing CIGAR in RNA-seq.

We will add some boundary checks for RNA data in the next release.

Jiayi-Wang-Joey commented 1 month ago

Thanks for reply. Sorry for not noting it's designed to DNA-seq. Is the Clair3 germline output still reliable in this case? I also tried Clair3-RNA, but it seems Clair3 identified more variants than Clair3-RNA.

HKU-BAL / ClairS

Only 1 somatic variant returned #34