Closed vladsavelyev closed 7 years ago
this seems like a good idea. I think if the conf has names=["ID"]
then it could set the ID column. I'll try to get to it this week.
Sounds great!
A follow-up question (somewhat related to #19), I wonder what would be the behaviour in case of multiple ID annotations, like:
[[annotation]]
file="dbsnp.vcf.gz"
fields=["ID"]
names=["ID"]
ops=["self"]
[[annotation]]
file="cosmic.vcf.gz"
fields=["ID"]
names=["ID"]
ops=["self"]
With the implementation in your mind, would be it replace it or append? It's not uncommon to see VCF records with IDs like rs763287238,rs864622601;COSM1721152
after annotation in my experience. It's okay if it will require a temporary field and an extra [[postannotation]] section to concatenate and remove, but maybe the ID is that kind of field that should always be appended by default (dropping any duplicates thought)?
Hi @vladsaveliev I just pushed a fix for this. It only allow setting ID in [[postannotation]]
by using name="ID". You can grab the IDs from e.g. dbsnp and cosmic as in your example above by the usual means in [[annotation]] blocks and then concat them or whatever in the [[postannotation]]
This is out in the latest release. Please give it a try and let me know any issues.
Thanks Brent,
I tried the following conf:
[[annotation]]
file="/home/vsaveliev/bcbio/genomes/Hsapiens/hg19/variation/dbsnp-147.vcf.gz"
fields=["ID"]
names=["rs_ids"]
ops=["concat"]
[[postannotation]]
name="ID"
fields=["rs_ids", "ID"]
op="lua:rs_ids .. ';' .. ID"
type="String"
And ran like this:
vcfanno ann.toml vars.vcf > ann.vcf
It complains about the missing -lua
flag:
vcfanno.go:105: ERROR: requested lua op without specifying -lua flag
Not clear what should be provided inside the lua script? All ops look built-in.
yeah, it thinks you need a lua script since you're using lua. Just add -lua <(echo "")
or an empty file.
Thanks!
When it appends to an empty ID (with empty content represented by .
), the values ends up rs369203887;.
. I guess some cool lua code can be set to erase dots, but I'm thinking it might be reasonable to make it a default behaviour? I might be wrong.
Also the postannotation section complains about the records that do not have rsids:
api.go:566: ERROR: in lua postannotation at chr21:10614769 for ID.
<string>:1: cannot perform concat operation between nil and string
stack traceback:
<string>:1: in main chunk
[G]: ?
empty values were: [rs_ids]
values were: [.]
code is: rs_ids .. ';' .. ID
vcfanno.go:166: Info Error: rs_ids not found in INFO >> this error/warning may occur many times.
That's fine and it proceeds with annotation correctly when there is only dbSNP to annotate. However when I add COSMIC, it fails to process those record that are found in only one source our of 2:
[[annotation]]
file="/home/vsaveliev/bcbio/genomes/Hsapiens/hg19/variation/dbsnp-147.vcf.gz"
fields=["ID"]
names=["rs_ids"]
ops=["self"]
[[annotation]]
file="/home/vsaveliev/bcbio/genomes/Hsapiens/hg19/variation/CosmicCodingMuts_v77.vcf.gz"
fields=["ID"]
names=["cosmic_ids"]
ops=["self"]
[[postannotation]]
name="ID"
fields=["rs_ids", "cosmic_ids", "ID"]
op="lua:rs_ids .. ';' .. cosmic_ids .. ';' .. ID"
type="String"
Only records that are found both in dbSNP and COSMIC are processed correctly:
chr21 15599144 . T C 142.0 REJECT;REJECT STATUS=Germline;SAMPLE=syn3-tumor;TYPE=SNV;SHIFT3=0;MSI=8;MSILEN=1;SSF=0.58433;SOR=1;LSEQ=TAATACATGTAAACTGTTAA;RSEQ=TTTTTTTCCTTAACCTTTTG;rs_ids=rs2822444
chr21 15599466 rs7280643;COSM149211;. A G 140.0 REJECT;REJECT STATUS=Germline;SAMPLE=syn3-tumor;TYPE=SNV;SHIFT3=0;MSI=2;MSILEN=1;SSF=0.36514;SOR=1.24714;LSEQ=CAGCTCATATAAATGGACTC;RSEQ=CCAACAACCAAGTGACTCTG;rs_ids=rs7280643;cosmic_ids=COSM149211
I guess a more complex lua script will help?
with stuff like this, you have full control over what happens in lua. To test, I just write a lua function and test it using any lua interpreter. here's what I came up with:
function setid(...)
local t = {...}
local res = {}
for i, v in ipairs(t) do
if v ~= nil and v ~= "" then
res[#res+1] = v
end
end
return table.concat(res, ";")
end
that will work for any number of fields. You'll use it like:
[[postannotation]]
name="ID"
fields=["rs_ids", "cosmic_ids", "ID"]
op="lua:setid(rs_ids, cosmic_ids, ID)"
type="String"
then you'll put the lua code into some.lua
and pass it to vcfanno as -lua some.lua
Great, thanks! Hoped to avoid diving into lua, but I guess I gotta - and that's cool, it looks powerful.
Thanks again for adding all new features Brent!
sure. let me know any other issues. If you are hitting them the others probably are as well. I'll add that function to the examples/custom.lua in the repo.
One small edit to handle the empty values (empty IDs are represented by dots .
by standard):
function setid(...)
local t = {...}
local res = {}
for i, v in ipairs(t) do
if v ~= "." and v ~= nil and v ~= "" then
res[#res+1] = v
end
end
return table.concat(res, ";")
end
However it doesn't work because rs_ids and cosmic_ids are optional in INFO, so when vcfanno doesn't find them, it errors out with Info Error: cosmic_ids not found in INFO >> this error/warning may occur many times. reporting once here...
, and it doesn't even call the lua script for such variants. I guess some extra step is needed, by I didn't yet figured out which. I'll dig into it some other day if don't have a quick solution in your mind already.
can you send an example VCF and conf that illustrates the problem? This could be a bug. Wne you see the error, it's actually just a wrning, but you should stall have the function-call if any of the arguments were present.
Sure, here is a small example: example.zip
vcfanno -lua some.lua small.toml small.vcf.gz
Only chr21:10910311 is found in COSMIC, and both chr21:9483406 and chr21:10910311 are found in dbSNP. However, vcfanno skips chr21:9483406 and doesn't set anything into ID for that variant.
Thanks very much for the nice test-case. I have made a new release with the fix here: https://github.com/brentp/vcfanno/releases/tag/v0.2.1
I also added your example to the test-suite. Thanks again.
Brilliant, thanks!
Brent, sorry to bother you again - but one more potential issue.
[[annotation]]
file="dbsnp.small.vcf.gz"
fields=["ID", "CAF"]
names=["rs_ids", "CAF"]
ops=["self", "self"]
If I add an optional INFO field to this config, vcfanno would refuse to run lua for the records missing this field (the same way it was with ID before your fix). A small example to reproduce: example.zip
Shoot. I only delayed the problem in the last "fix". I think it's properly resolved here: https://github.com/brentp/vcfanno/releases/tag/v0.2.2 thanks again for the test-case.
Great! And thanks for the delete
operation, with that I have all I want
I just put the answer here, because you didn't give it explicitly
[[annotation]]
file="dbsnp_146.hg38.vcf.gz"
fields = ["ID"]
ops=["self"]
names=["rsid"]
[[postannotation]]
name="ID"
fields=["rsid"]
op="setid"
type="String"
Hi Brent,
Thank you for vcfanno, it's really great to be able to delegate nearly all VCF post-processing job to one fast tool.
I wonder if there a way to put an annotation value into the ID column in VCF. I used to add the rsIDs and COSMIC ids into the ID column, and I don't seem to find a way to configure vcfanno do that instead of appending to INFO.
As a quick solution, I could just utilise bcftools to remap the INFO field over to ID. But it would be much better to keep all annotation within on vcfanno call with a proper config file.
What do you think about this?