Parsoa / SVDSS

Improved structural variant discovery in accurate long reads using sample-specific strings (SFS)
MIT License
42 stars 4 forks source link

CIGAR and query sequence lengths differ for reads in smoothed.selective.bam #16

Open kfletcher88 opened 1 year ago

kfletcher88 commented 1 year ago

Hi, I was trying to use this exciting tool to call SVs using HiFi reads on a genome assembled from those same reads. I can run SVDSS index and SVDSS smooth. The file smoothed.selective.bam is generated. When I run SVDSS search --assemble the process completes generating files called solution_batch_*.assembled.sfs, but the log contains the samtools error [E::bam_read1] CIGAR and query sequence lengths differ for m64069_220917_213940/14419164/ccs multiple times for different reads.

Because of this, I am unable to generate an index for the BAM

$ samtools index smoothed.selective.bam
[E::bam_read1] CIGAR and query sequence lengths differ for m64069_220917_213940/163645521/ccs
samtools index: failed to create index for "smoothed.selective.bam"

Meaning I cannot complete the pipeline. I am able to index the BAM input to SVDSS smooth without issue.

I have tried to filter the BAM for primary reads, but I run into the same problem. Please let me know if you have any advice to overcome this issue. Thanks Kyle

ldenti commented 1 year ago

Hi Kyle, thanks for trying SVDSS out.

I encountered a similar issue some times ago, I tried to solve it with no success and then I forgot to go back to it..

Just to be sure that the issue is the same, can you please try to use the SVDSS version you find in the lowcov branch? Unfortunately, you'll have to compile it from scratch since it's a development branch and I didn't produce the binary yet..

I suspect that for those alignments, in the new version you should see a line starting with |. Something like:

| m64069_220917_213940/163645521/ccs xxxxx xxxxxxx

(where xxxxx are numbers).

If this is the case, I'll try to fix it.

If you have any issue, please let me know. Other way to check this (in case you encounter any issue while compiling):

Let me know what you prefer.

Best,

YahGao commented 1 year ago

Hi,

I got the same error. I used pbmm2 to map raw fastq files and used the mapped bam files as input to SVDSS. After using smooth to obtain the smoothed.selective.bam file, I have the same problem. Below are the first 10 lines of smoothed.selective.bam. Do you have any thoughts on this issue?

Thanks, Yh

[2-svdss]$ samtools view smoothed.selective.bam | head | cut -c 1-80 m54337U_221231_061623/144640562/ccs 0 1 32442 46 1685S3=1X6=1X5=2D14=1X2=1X2=1I2 m54337U_221231_061623/40501721/ccs 16 1 34529 47 13S3=1X6=1X1=1X4=1X7=1X1=2X10=2 m54337U_221229_210659/167772847/ccs 0 1 39016 60 16S129=1I224=1X223=1X2=1X401=1X m54337U_221231_061623/48891338/ccs 16 1 41572 37 15S142=1X102=22I76=1X97=1X119=1 m54337U_221231_061623/55642959/ccs 16 1 43466 41 123S8=1X5=1X4=1X6=1X3=1X6=1X3=1 m54337U_221229_210659/139004586/ccs 16 1 45080 30 62S136=1X373=1X109=1X82=1X59=1 m54337U_221229_210659/15860237/ccs 0 1 49121 60 18S10=1X10=1X2=1X4=1X6=2X1=1X1=1 m54337U_221231_061623/135398753/ccs 2064 1 52978 30 16S5=1X2=1X6=1D1X2=1X19=1X3= m54337U_221231_061623/54004238/ccs 0 1 54175 38 8058S5=1X4=1I5=2X8=1X9=1I10=1X2= m54337U_221231_061623/132055825/ccs 16 1 54364 41 16S91=1X64=1X49=1X7=1X213=1X14

kfletcher88 commented 1 year ago

Thanks for the response, I cloned and compiled from the lowcov branch git clone -b lowcov https://github.com/Parsoa/SVDSS.git and can confirm the expected output from SVDSS smooth

[I] Loading first batch..
|m64069_220917_213940/119343055/ccs 6 34509
|m64069_220917_213940/30672650/ccs 4 32868
|m64069_220917_213940/2884910/ccs 36358 36360
|m64069_220917_213940/92735840/ccs 32692 32697
|m64069_220917_213940/137824722/ccs 4 34365
|m64069_220917_213940/75367744/ccs 4 31994
|m64069_220917_213940/67633453/ccs 4 29212
|m64069_220917_213940/130025065/ccs 4 27036
|m64069_220917_213940/153945456/ccs 4 19448
|m64069_220917_213940/172427254/ccs 4 21069
|m64069_220917_213940/19465819/ccs 4 20898
|m64069_220917_213940/3998562/ccs 4|m64069_220917_213940/142673267/ccs 6 33368
 41157
|m64069_220917_213940/105841111/ccs 4 31923
|m64069_220917_213940/61081766/ccs 4 19399
||m64069_220917_213940/139266167/ccs 6 32847
|m64069_220917_213940/93456725/ccs 4 31757
m64069_220917_213940/85919307/ccs 6 34644

samtools index now runs ok on smoothed.selective.bam. SVDSS search--assemble appears to run without issue. SVDSS call provides an Illegal instruction (core dumped) error.

Thanks

ldenti commented 1 year ago

I see. Well, at least we found the problem. I'll fix the smooth and let you know.

In the meantime, can you please try to run the SVDSS call step using the v1.0.4 release? If it crashes, please open a new issue - otherwise it's a bug in the dev branch (for sure I messed something up).

@YahGao I suspect your error is the same, so I'll let you know when it's solved.

Best,

ldenti commented 1 year ago

I think I've fixed this. Please, check the new commit on the lowcov branch.

So pull, recompile, and rerun the smoothing. You shouldn't see any warning anymore (or line starting with |). And the produced .bam should be correct.. Please let me know if this is not the case.

Regarding the SVDSS call issue, let me know how it's going and/or open a new issue.

Best,

ldenti commented 1 year ago

Please check the latest release v1.0.5. This bug should be fixed now.