Duplicated CYP2D6 diplotypes in matcher JSONs

bug

For some test samples, Matcher JSON has duplicate CYP2D6 diplotypes under the -research cyp2d6,combinations mode.

The duplicates can be found under CPIC-CYP2D6-diplotypes. The attached mock VCF will produce a Matcher JSON with two (2) *1/[*3 + *4 + *122]

If the current behavior is a bug, please provide the steps to reproduce and, if possible, your example input data via a Gist or similar.

This could be reproduced by running

docker run --rm -v ./:/pharmcat/data pgkb/pharmcat pharmcat -vcf data/mock_test.vcf -research combinations,cyp2d6

Only one unique diplotype should be reported.

The number of unique diplotypes from the Matcher JSON files helps determine the fields of the Phenotyper JSON I should process.

Please tell us about your environment:
- PharmCAT Version: 2.13.0
- JDK Version: openjdk 17.0.6
- Environment: [ Windows | macOS | Linux distro | etc ]
Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, e.g. stackoverflow, gitter, etc.) test_file.zip

PharmGKB / PharmCAT