broadinstitute / gatk-sv

A structural variation pipeline for short-read sequencing
BSD 3-Clause "New" or "Revised" License
160 stars 71 forks source link

svtk vcfcluster creates lists of INFO values when Number = 1 #659

Open epiercehoffman opened 3 months ago

epiercehoffman commented 3 months ago

When svtk vcfcluster merges records, it creates lists of INFO values from the member records, even when the Number for the INFO key defined in the header is 1 (link). This can create issues down the line when the INFO values don't match the Number, so this behavior should be fixed.

mwalker174 commented 3 weeks ago

I think the best options are:

  1. Choose a representative record and fill unrecognized fields with that or
  2. Use empty value .

The latter is a bit "safer" in terms of possibly propagating bad information. This issue could take significant time/testing to ensure it doesn't cause any issues downstream of CombineBatches.