fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
353 stars 47 forks source link

Merging UKB SV Files #204

Open GHawkes93 opened 9 months ago

GHawkes93 commented 9 months ago

Hi,

In the recent release of 500,000 genomes, the UKB has provided SV calls, but only in bgzipped sample-level vcf files.

I've tried merging these files in groups to create a pVCF- after unzipping each vcf, as survivor doesn't seem to take .gz files? - but the file size is growing such that I can't merge those groups (I get a "Killed" error). I tried trimming the vcf files to just genotypes in the FORMAT field using bcftools - but then the merging was odd, in that when merging two files with 9000 people each in, I got only 2 individuals in the output

Do you have any suggestions for how I could perform this analysis?

Cheers, Gareth

GHawkes93 commented 9 months ago

I should add - I'm using a 72-core machine - each group file (approx 9k people) is ~ 270GB and contains ~.5M SVs