blachlylab / mucor3

Parses VCF data into tabular spreadsheets and aggregates data by sample
MIT License
0 stars 0 forks source link

Pivoted dataset can have duplicate variant records #3

Closed charlesgregory closed 5 years ago

charlesgregory commented 5 years ago

When pivoting and including metadata columns, if metadata columns are different for some samples for the same variant, they will be reported separately.

CHROM POS REF ALT meta1 sample1 sample2
chr1 1 G A BAD 0.1 .
chr1 1 G A GOOD . 0.2
chr1 2 G T BAD;GOOD . 0.1
chr1 2 G T GOOD;BAD 0.5 .

Suggest pivoting with only CHROM, POS, REF, ALT. Then add metadata columns back with a left join. Columns from the right dataset in the join should be merged on CHROM, POS, REF, ALT.