bioinfo-chru-strasbourg / howard

Highly Open Workflow for Annotation & Ranking toward genomic variant Discovery
GNU Affero General Public License v3.0
6 stars 2 forks source link

Sample columns disappear #261

Closed JbaptisteLam closed 2 months ago

JbaptisteLam commented 2 months ago

HOWARD (last devel version) installed with pip in a conda env

I'm trying to annotate a multi sample vcf from WGS analysis with a huge database in parquet. Annotation goes well but there are no sample column anymore in the output file (the last one in mandatory column is FORMAT), Moreover, I can't reproduce this behaviour with only a few variants from my cohort...

Best,

Jean-Baptiste

antonylebechec commented 2 months ago

This is probably due to not well-formed genotypes on samples columns. Indeed, input file format may contains a mix of columns, not only in VCF format (such as a TSV file), to include columns that are annotations (or whatever). Identify sample columns (those with genotypes) is tricky. Moreover, to ensure correct calculations on genotypes (e.g. trio calculation), format should be well formed.

A fix had been done (#263) to be more flexible on genotype format, and an option had been added to ensure that the list of sample is well-defined.

Try:

  1. First, check your input file. You can also reformat it using bcftools command (e.g. bcftools view input.vcf -o input.checked.vcf).
  2. Then, try again your input file (both the not reformatted to test new code, and new formatted one with bcftools), and associated command, to check if genotypes are well-formed (and if all sample are present).
  3. If it does not resolve the issue, simply add sample list in param.json file (see docs/help.param.md), to ensure to keep these columns.