Closed MattWellie closed 2 years ago
I had a better look at the code and instead reheadered with
##INFO=<ID=vep_gene_ids,Number=1,Type=String,Description="Format: GENE">
and used CSQ_FIELD="vep_gene_ids". This was slightly more successful, and the gene field was recognised, but the process still failed:
[slivar] unable to find consequence field in vep_gene_ids
So it looks like if a custom field is being used as a custom CSQ field, it must contain at least one GENE, CONSEQUENCE, and TRANSCRIPT value in order for parsing to succeed, meaning that the process can't currently accept single-string Gene value fields.
Looking into this a bit more, the gene/transcript/consequence all factor into the decision about whether to retain a variant as a candidate for the C-H process, so it's probably asking for a bit much to allow for all/some of these fields to be accessed in novel locations.
Yes, you're correct that it needs all 3. You could construct a field that has:
MYCSQ=$GENE|$TRANSCRIPT|$IMPACT
and then make the appropriate header with Description field.
Beyond that, I think it will be hard to do what you want with slivar
.
Thanks! I think it should be possible to do that, so I'll have a crack tomorrow.
ok. then in the header, you'd need something like:
##INFO=<ID=CSQ,Number=.,Type=String,Description="MyAnno: 'Gene | Transcript | Consequence' ">
so slivar knows how to parse it.
@brentp (and any else who might find this), this worked for me:
COMPOUND_CSQ=ENSG00000103197|ENST00000219476|missense_variant
bcftools view -h "${file_in}" | \
sed 's/##INFO=<ID=COMPOUND_CSQ,Number=1,Type=String,Description="">/##INFO=<ID=COMPOUND_CSQ,Number=1,Type=String,Description="Format: 'Gene\|Transcript\|Consequence'">/' > new_header
bcftools reheader -h new_header --threads 4 -o "${file_out}" "${file_in}"
- ran a compound het check using the custom field
`CSQ_FIELD="COMPOUND_CSQ" slivar compound-hets -p INPUT.ped -o OUT.vcf -v INPUT.vcf.bgz -a`
One note, this compound field can't be used for `impactful`? As far as I can see in the src there is no equivalent to `for f in ["ANN", "CSQ", "BCSQ", getEnv("CSQ_FIELD")]:` used in getting the Consequence in the main `expr` mode; this only exists for compound-hets. Then again this is the first time I've ever really read JS, so I could be wrong there.
Thanks for the help!
I have a VCF which was annotated in Hail using VEP, and select annotation/CSQ fields were exported as independent INFO fields. Header extract (
vep_gene_ids
is a String field representing a single GeneID (ENSG), the name shouldn't have been a plural):I would like to use the comp-het functionality of Slivar, pointing at the
vep_gene_ids
as the Gene ID. I was flicking through the Slivar code and found the use ofgetEnv('CSQ_FIELD')
to indicate a custom CSQ field. I re-headered the VCF with##INFO=<ID=vep_gene_ids,Number=1,Type=String,Description="GENE">
and used
export CSQ_FIELD="vep_gene_ids"
to see if Slivar could find the field and treat the single entry as if it were a single CSQ field, but no luck.Do you know a way to support the use of a custom field as a comp-het Gene field? I guess the same manual selection would also apply to using
bcftools +split-vep -d -c Consequence
To break out the CSQ mono-string into separate consequences, that could all be filtered individually before being run through comp-het.
An example use-case I have is exploding the rows based on consequence, and latching onto only those consequences which are against MANE transcripts. In this situation I don't think
impactful
is a sufficient representation of the clinical relevance of the variant, so splitting the single CSQ block out into individual consequences, and removing non-MANE CSQs is required.