broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.63k stars 579 forks source link

VS_1327 ensure sample ids are unique #8818

Closed gbggrant closed 1 month ago

gbggrant commented 1 month ago

This PR adds a task to GvsAssignIds to verify that there are no duplicate sample names in the file provided.

Here is an example run of BulkIngest that replicates the original reported problem. No sample set provided, the sample id column is not sample_id and there's a duplicate in THAT column. Here is an example run where the updated code runs and reports the problem early-ish without creating database tables that need to be cleaned up. Here is a normal run that passes (same basic idea as the initial problem, except that I removed the duplicate row from the samples table.

Here is a passing integration test.