eldariont / svim-asm

Structural Variant Identification Method using Genome Assemblies
GNU General Public License v3.0
92 stars 11 forks source link

quality and sequence mismatch #10

Closed masen1991 closed 3 years ago

masen1991 commented 3 years ago

hell @eldariont do u have any opinion on this bug? SVIM_210622_110108.log

Thanks

jakob-he commented 3 years ago

Hi @masen0407,

Thanks a lot for approaching us about this! The error occurred due to an incorrect setting of quality scores but should be fixed with the latest commit.

Best, Jakob

eldariont commented 3 years ago

Hi @masen0407,

one other thing I noticed in your log file is the name of your input BAM file: PacBio_CCS_15kb/HG002.pb.15kb.minimap2.hs37d5.sorted.bam

Just to make sure: Is this a genome-genome alignment or an alignment of reads? In the second case, you should use SVIM instead of svim-asm.

Cheers, David

yekaizhou commented 3 years ago

Hi, is this bug fixed? I encountered with the same error.

jakob-he commented 3 years ago

Hi @yekaizhou,

the bug should be fixed with the lasted commit. However, It is not part of the current bioconda release. So for it to take effect you have to update SVIM-asm via git clone and pip (second install option in the Readme).

Best, Jakob

yekaizhou commented 3 years ago

Hi @yekaizhou,

the bug should be fixed with the lasted commit. However, It is not part of the current bioconda release. So for it to take effect you have to update SVIM-asm via git clone and pip (second install option in the Readme).

Best, Jakob

Hi Jakob,

Thanks for your help.

However, it seems bioconda have the latest version (1.0.2) same as github here, and the github version that I have also tried still have the same error.

I wish to get a phased SV call set from read alignments. Therefore, seems SVIM-asm can do this functionality while SVIM is not able. I am wondering if the asm version can process read alignments the same as SVIM, but can work for SV phasing? Is it not able to do it (as the error shows), or it can generate a phased SV call set but maybe not very accurate?

Best, Yekai

jakob-he commented 3 years ago

Hi Yekai,

the release version is the same on bioconda and github but the release doesn't include the latest commit. It sounds like you have tried it with the current github repository, so the error is likely caused by a different issue. Would you mind uploading the full error message or is exactly the same as in the original issue?

Generally, calling phased SVs from read alignments using SVIM-asm is unfortunately not possible. SVIM-asm is only able to call phased SVs if you provide haplotype assemblies. This is due to the identification of SV candidates which assumes the input is an assembly represented by a "single read". Unfortunately, I am also not aware of any SV-caller that has this functionality for an unphased set of reads. In most cases, the reads are first phased using SNV calls and then each haplotype is assembled separately and compared to the reference genome.

Best, Jakob

yekaizhou commented 3 years ago

Hi Jakob,

Sorry for my unclear description. I actually fed SVIM-asm with phased reads generated by SNV calling and WhatsHap read haplotagging. The read depth of my data is low so that haplotype assembling is not very satisfying. Therefore I am trying if SVs can be called and phased directly from the phased reads.

Thanks a lot for your help! Yekai

mtva0001 commented 2 years ago

Hi!

We have the same issue using the latest version (1.0.2):

(svim-asm) b-an01 [/proj/nobackup/snic2022-6-27/Kesava/Mutation]$ svim-asm haploid . Week29PCG1_sorted.bam W0barcode5consensus.FASTA.fasta 2022-09-16 15:35:18,765 [INFO ] ** Start SVIM-asm, version 1.0.2 ** 2022-09-16 15:35:18,768 [INFO ] CMD: python3 /pfs/stor10/users/home/k/kesava03/Public/svim-asm/bin/svim-asm haploid . Week29PCG1_sorted.bam W0barcode5consensus.FASTA.fasta 2022-09-16 15:35:18,768 [INFO ] WORKING DIR: /pfs/proj/nobackup/fs/projnb10/snic2022-6-27/Kesava/Mutation 2022-09-16 15:35:18,768 [INFO ] PARAMETER: sub, VALUE: haploid 2022-09-16 15:35:18,768 [INFO ] PARAMETER: working_dir, VALUE: /pfs/proj/nobackup/fs/projnb10/snic2022-6-27/Kesava/Mutation 2022-09-16 15:35:18,768 [INFO ] PARAMETER: bam_file, VALUE: Week29PCG1_sorted.bam 2022-09-16 15:35:18,768 [INFO ] PARAMETER: genome, VALUE: W0barcode5consensus.FASTA.fasta 2022-09-16 15:35:18,768 [INFO ] PARAMETER: verbose, VALUE: False 2022-09-16 15:35:18,768 [INFO ] PARAMETER: min_mapq, VALUE: 20 2022-09-16 15:35:18,768 [INFO ] PARAMETER: min_sv_size, VALUE: 40 2022-09-16 15:35:18,768 [INFO ] PARAMETER: max_sv_size, VALUE: 100000 2022-09-16 15:35:18,768 [INFO ] PARAMETER: query_gap_tolerance, VALUE: 50 2022-09-16 15:35:18,768 [INFO ] PARAMETER: query_overlap_tolerance, VALUE: 50 2022-09-16 15:35:18,769 [INFO ] PARAMETER: reference_gap_tolerance, VALUE: 50 2022-09-16 15:35:18,769 [INFO ] PARAMETER: reference_overlap_tolerance, VALUE: 50 2022-09-16 15:35:18,769 [INFO ] PARAMETER: sample, VALUE: Sample 2022-09-16 15:35:18,769 [INFO ] PARAMETER: types, VALUE: DEL,INS,INV,DUP:TANDEM,DUP:INT,BND 2022-09-16 15:35:18,769 [INFO ] PARAMETER: symbolic_alleles, VALUE: False 2022-09-16 15:35:18,769 [INFO ] PARAMETER: tandem_duplications_as_insertions, VALUE: False 2022-09-16 15:35:18,769 [INFO ] PARAMETER: interspersed_duplications_as_insertions, VALUE: False 2022-09-16 15:35:18,769 [INFO ] PARAMETER: query_names, VALUE: False 2022-09-16 15:35:18,769 [INFO ] ** STEP 1: COLLECT ** 2022-09-16 15:35:18,769 [INFO ] MODE: haploid 2022-09-16 15:35:18,769 [INFO ] INPUT: /pfs/proj/nobackup/fs/projnb10/snic2022-6-27/Kesava/Mutation/Week29PCG1_sorted.bam 2022-09-16 15:35:18,838 [INFO ] Processing chromosome utg000001l... 2022-09-16 15:35:18,866 [ERROR ] quality and sequence mismatch: 16427 != 0 Traceback (most recent call last): File "/pfs/stor10/users/home/k/kesava03/Public/svim-asm/bin/svim-asm", line 183, in sys.exit(main()) File "/pfs/stor10/users/home/k/kesava03/Public/svim-asm/bin/svim-asm", line 74, in main sv_candidates = analyze_alignment_file_coordsorted(aln_file1, options) File "/pfs/stor10/users/home/k/kesava03/Public/svim-asm/lib/python3.8/site-packages/svim_asm/SVIM_COLLECT.py", line 72, in analyze_alignment_file_coordsorted supplementary_alignments = retrieve_other_alignments(current_alignment, bam) File "/pfs/stor10/users/home/k/kesava03/Public/svim-asm/lib/python3.8/site-packages/svim_asm/SVIM_COLLECT.py", line 50, in retrieve_other_alignments a.query_qualities = main_alignment.query_qualities File "pysam/libcalignedsegment.pyx", line 1514, in pysam.libcalignedsegment.AlignedSegment.query_qualities.set ValueError: quality and sequence mismatch: 16427 != 0

The fasta file is a genome assembly.

eldariont commented 2 years ago

Hi mtva0001,

as Jakob wrote above, the latest release (v1.0.2) did not include the bug fix until today. Just now, I created a new release (v1.0.3) and uploaded it to pypi (bioconda following soon). Could you please use this version and report back whether it fixes your problem?

Best David