fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
354 stars 47 forks source link

SURVIVOR merge files outputs no sites #66

Closed complexgenome closed 5 years ago

complexgenome commented 5 years ago

Hi, I'm using Version: 1.0.6 for SURVIVOR to merge .vcf files.

~/path/SURVIVOR merge sample_files  1000 2 1 1 0 10 survivor_merged.vcf
merging entries: 8091

It says about merging entries but the .vcf file generated reports nothing but VCF headers, version. It stops after printing:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 550050_110003

Is is because my VCFs contain same samples and survivor is unable to merge them? Or, is it because samples in both VCFs are present in different order?


uname -a
Linux hgrcgridlogin 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64 GNU/Linux
fritzsedlazeck commented 5 years ago

Hi, this is just a guess: SURVIVOR expects one to many file names in the sampe_files. It seems as there is just one file name listed. Usually it reports merging entries per vcf file parsed.

Since you require the variants to be supported by at least 2 input files the output is only the header.

Thanks for reaching out. Fritz

complexgenome commented 5 years ago

Sorry, I'm not following you. How do I get desired output? There are numerous sites present in both (i.e overlapping) VCF files, so some output would have been generated.

Thank you.

fritzsedlazeck commented 5 years ago

I think you should check your sample_files file. Does it list both files?

Thanks Fritz

fritzsedlazeck commented 5 years ago

Did you had a chance to check this? Was it resolved? Thanks Fritz

complexgenome commented 5 years ago

Ah, OK. There was a new line character missing in sample_files that caused the error. Thank you checking. :)

complexgenome commented 5 years ago

I think I've difficulty getting output. The merged output contains only two samples. Both my input vcfs have 68 individuals. How do I know what's going wrong. Would it be possible to have a log file when making output? It would be great to see the parameters and other details when merging. :)

fritzsedlazeck commented 5 years ago

Yeah SURVIVOR was implemented with the principle that per sample/ call you have 1 individual. Usually we do a calling with 3 or more caller per individual then build a consensus per sample. then merge across the samples.

So Survivor does not recognize your 68 individuals and thus handles the two vcf files as two samples. Sorry about that. Fritz

complexgenome commented 5 years ago

I see. Is SURVIVOR appropriate when working with only two callers?

When I provide 2 (supported by 2 callers ) and sample file with two VCFs, the merged file contains sites that aren't present in one VCF, or other.

fritzsedlazeck commented 5 years ago

Yes no problem. I just wanted to make the point that I am not distinguishing between different callers vs. different samples per file

Yes when you do that there should only be entries that are present in both VCF files. In your case that can mean that only one of the samples in each VCF file has the variant.

The way your input is structured is unfortunately not what I had in mind when implementing survivor. Sorry for that.