brentp / smoove

structural variant calling and genotyping with existing tools, but, smoothly.
Apache License 2.0
222 stars 21 forks source link

Smoove population SV calling support for Illumina Dragen SV VCF files #165

Open WimSpee opened 2 years ago

WimSpee commented 2 years ago

Hi Brent,

We have a collection of Structural Variant VCF files produced by Illumina Dragen. These are per sample, not yet in a multi-sample "squared off" format.

I tried to run Smoove population SV calling to create a multi-sample "squared off" SV VCF file.

This however runs into an error about CIEND being missing for some input SV variants. CIEND and also CIPOS indeed seem to be missing for some input variants. I am not sure how important these attributes/values really are to Smoove population SV calling.

This breaks this VCF parse line in SV Tools https://github.com/hall-lab/svtools/blob/master/svtools/l_bp.py#L136

Do you think it might work and make sense for Smoove to also work for single sample SV VCF files created by other SV callers? e.g. Dragen, provided the SV VCF files contain a certain set of VCF attributes?

Not sure if the missing CIEND or CIPOS attributes/values would be the only issue. or that Smoove population SV calling really tied to Lumpy for creating the per sample SV VCF files?

Thank you for your thought on this.

Wim

smoove merge -name merged -f ../../ref/my_genome.fa --outdir ./ *dragen*sv.vcf.gz
[smoove] 2021/07/20 15:30:36 starting with version 0.2.3
[smoove] 2021/07/20 15:30:36 merging 98 files
[smoove] 2021/07/20 15:30:36 finished sorting 98 files; merge starting.
[smoove] 2021/07/20 15:31:06 Traceback (most recent call last):
  File "/Tools/bcbio/1.1.5/anaconda/envs/python2/bin/svtools", line 11, in <module>
    sys.exit(main())
  File "/Tools/bcbio/1.1.5/anaconda/envs/python2/lib/python2.7/site-packages/svtools/cli.py", line 79, in main
[smoove] 2021/07/20 15:31:06     sys.exit(args.entry_point(args))
[smoove] 2021/07/20 15:31:06   File "/Tools/bcbio/1.1.5/anaconda/envs/python2/lib/python2.7/site-packages/svtools/lsort.py", line 123, in run_from_args
[smoove] 2021/07/20 15:31:06     sorter.execute()
[smoove] 2021/07/20 15:31:06   File "/Tools/bcbio/1.1.5/anaconda/envs/python2/lib/python2.7/site-packages/svtools/lsort.py", line 58, in execute
[smoove] 2021/07/20 15:31:06     self.vcf_lines.sort(key=l_bp.vcf_line_key)
  File "/Tools/bcbio/1.1.5/anaconda/envs/python2/lib/python2.7/site-packages/svtools/l_bp.py", line 150, in vcf_line_key
[smoove] 2021/07/20 15:31:06     v1 = split_v(l1)[:8]
  File "/Tools/bcbio/1.1.5/anaconda/envs/python2/lib/python2.7/site-packages/svtools/l_bp.py", line 128, in split_v
[smoove] 2021/07/20 15:31:06     start_r = pos_r + int(m['CIEND'].split(',')[0])
[smoove] 2021/07/20 15:31:06 KeyError: 'CIEND'
2021/07/20 15:31:06 exit status 1
brentp commented 2 years ago

Hi Wim, I wouldn't use it on anything but lumpy/smoove calls. If you want to try it on dragen calls, you could write a quick script to set CIPOS and CIEND to 0,0 for all variants that don't already have it.