fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
347 stars 46 forks source link

Merging multiple samples in one vcf #147

Closed ashamehta closed 3 years ago

ashamehta commented 3 years ago

Hi, I have a single sorted vcf that contains SV calls from multiple samples. There are calls that are significantly overlapping and I would like to consolidate them. Does SURVIVOR merge support this use case? I'm trying to run ./SURVIVOR merge sample_file.txt 100 0 1 1 30 output_file.vcf where sample_file.txt contains the path of the single vcf but am getting the help menu as the output: File with VCF names and paths max distance between breakpoints Minimum number of supporting caller Take the type into account (1==yes, else no) Take the strands of SVs into account (1==yes, else no) Estimate distance based on the size of SV (1==yes, else no). Minimum size of SVs to be taken into account. Output VCF filename

I've also tried running ./SURVIVOR merge input_file.vcf 100 0 1 1 30 output_file.vcf but get the same output.

Thanks in advance for your help and for your tool!

ashamehta commented 3 years ago

Additionally, is there a way to view which SVs have been merged?

fritzsedlazeck commented 3 years ago

Dear @ashamehta ,

you are missing one parameter. e.g. ./SURVIVOR merge sample_file.txt 100 0 1 1 0 30 output_file.vcf should work .

Unfortunately, SURVIVOR cannot work based on a multi sample file. You would need to start from the individual sample file. Worst case you could also split up your current file and feed each of the samples vcf files to SURVIVOR.

Hope that helps Fritz

ashamehta commented 3 years ago

Hi Fritz,

Ah good catch, thanks. Unfortunately, the data that I have was released in aggregate without patient-level information, so I can't split it up.

Appreciate your quick response, Asha

fritzsedlazeck commented 3 years ago

Does it have per sample information ? What I am reading it does not right ?

you could of course just merge the VCF file with itself... I don't know if that's too confusing but it would take care of some redundancy.

Cheers Fritz

ashamehta commented 3 years ago

I'll try that, thank you!

Two more questions:

  1. Is there a way to preserve the INFO field from each SV when they are merged? I have frequencies associated with each variant that I need to add together when variants are merged (i.e. if SV1 and SV2 are merged into SV3, I want to add INFO/HETCOUNT for SV1 + INFO/HETCOUNT for SV2 to get the het count of SV3).
  2. What does the 5th numbered parameter mean (i.e. the 0 in ./SURVIVOR merge sample_file.txt 100 1 1 1 0 30 output_file.vcf)? I see descriptions for the other 5 here (https://github.com/fritzsedlazeck/SURVIVOR/wiki/Methods-and-Parameter), but not sure about the 5th one.

Thanks again for all of your help, Asha

fritzsedlazeck commented 3 years ago

Ad 1 : sadly no. Ad 2: you mean the 0 right ? it was a precious parameter that i no longer use/ support. I didn't want to destroy scripts that people have already so i just kept it.

ashamehta commented 3 years ago

Ah okay, makes sense. Ok, really appreciate all this explanation. I feel much more proficient with the tool.