Closed nickhir closed 3 years ago
Hi @nickhir
Thanks for your interest in the software and your questions.
Sorry about the homepage being out of date - we messed up. I will fix that. The reference manual contains the same material and is up-to-date. The function documentation in R itself using ?
is also always up-to-date.
e.g. annotation with vep or SnpEff
Nothing fancy - here is an example. We use default parameters for SnpEff.
save the final result as a .tsv file for example
garnish_affinity
returns a data frame that can be saved from R via your favorite method. rio is nice.
Furthermore, I was wondering if I can also use your tool if my vcf file was created using the GRCh37 reference genome.
GRCh37 will work if the transcript IDs you used for annotatations (e.g. via SnpEff) are in the custom transcript DB antigen.garnish
uses to determine amino acid sequences for predictions. The transcript DB file (GRChm38_meta.RDS
) is in the data directory antigen.garnish
downloads during installation. The default location is "$HOME/antigen.garnish"
. You can re-download the data files if needed.
Could you load the transcript DB into R and check if your annotations are included?
db <- readRDS("GRChm38_meta.RDS")
str(db)
If they are not, re-annotating should not be a major hurdle.
Ping me here if you run into trouble, I'm happy to help further.
Thanks
Andrew
Thank you so much for the fast and detailed reply. It is much appreciated!
Currently I have annotated my VCF files with vep
but now I will use SnpEff
instead and then check if the transcript IDs are present in the GRChm38_meta.RDS
file.
On a different note: does garnish_variants
expect additional information on top of the SnpEff
? With that I mean information such as the GT, AF, DP, ... in the FORMAT field of the vcf file.
And lastly: Am I correct in assuming that the "usual" workflow of antigen_garnish is:
garnish_variants
-> garnish_affinity
-> garnish_antigens
.
Thank you very much! Nick
Garnish variants does not need additional information but the VCF must meet spec or vcfR will likely fail to parse the VCF file.
That is the standard workflow.
On Dec 20, 2020, at 18:20, nickhir notifications@github.com wrote:
Thank you so much for the fast and detailed reply. It is much appreciated! Currently I have annotated my VCF files with vep but now I will use SnpEff instead and then check if the transcript IDs are present in the GRChm38_meta.RDS file.
On a different note: does garnish_variants expect additional information on top of the SnpEff ? With that I mean information such as the GT, AF, DP, ... in the FORMAT field of the vcf file.
And lastly: Am I correct in assuming that the "usual" workflow of antigen_garnish is: garnish_variants -> garnish_affinity -> garnish_antigens.
Thank you very much! Nick
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
If garnish_variants
does not need additional information, why does the vcf have to be a paired tumor-normal vcf file?
Wouldn't it be enough to simply record the tumor mutations in the "TUMOR" column.
Wouldn't it be enough to simply record the tumor mutations in the "TUMOR" column.
This is correct. I just briefly reviewed the code and I am not certain that a paired VCF file is actually required. However, our test VCFs are all paired, so I am not certain a non-paired VCF will work. I can look into this further if it is a work-stopping issue for you.
Maybe it is simpler for you to use a table as input? You can pass a data frame directly to garnish_affinity
. Four columns are required. Here is an example (also pasted below).
> str(dt)
Classes ‘data.table’ and 'data.frame': 2 obs. of 4 variables:
$ sample_id : chr "test" "test"
$ transcript_id: chr "ENST00000128119.1" "ENST00000128119.1"
$ cDNA_change : chr "c.4988C>T" "c.4988C>T"
$ MHC : chr "HLA-A*02:01 HLA-E*01:03" "HLA-DQA10402-DQB10511"
> dt
sample_id transcript_id cDNA_change MHC
1: test ENST00000128119.1 c.4988C>T HLA-A*02:01 HLA-E*01:03
2: test ENST00000128119.1 c.4988C>T HLA-DQA10402-DQB10511
Sorry to ask yet another somewhat unrelated question, but it seems like curl -fsSL "http://get.rech.io/antigen.garnish-2.0.0.tar.gz" | tar -xvz
doesn`t work anymore, because the URL is returning an error. I wanted to redownload the testdata because I changed some things and ran into this error.
curl: (22) The requested URL returned error: 403
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Regarding the transcript_IDs of GRCh37:
If I use the default SnpEff command, my variants only get annotated with the transcript ID (ENST00000376887), but not with the exact version (ENST00000376887.4).
I saw that the transcript_id
column of GRCHm38_meta.RDS
uses transcripts with exact version numbers. Furthermore, in the example VCF files, the transcripts were also annotated with exact version numbers. I guess that this will cause problems in the actual analysis.
Do you have an idea what I can do to prevent this? Using vep
I can specify that i want the versioned transcriptes (--transcript_version
), but i didnt find a similar option for SnpEff.
I think you tried downloading this file as I was re-uploading it... can you try again?
I guess that this will cause problems in the actual analysis.
Correct. cDNA sequences change over time -- without knowing which versions were used, we can't know the sequences.
Not sure how to get versioned transcripts on GRCh37, sorry.
Hello,
I was wondering if you plan on extending the documentation antigen.garnish 2.0.0. If I am not mistaken, the documentation on the homepage is a little outdated. For example the command
antigen.summary()
does not exist in my version (packageVersion("antigen.garnish")
returns'2.0.0'
). It would be extremely helpful to see the preprocessing steps that you performed, e.g. annotation withvep
orSnpEff
and also how to properly save the final result as a.tsv
file for example.Furthermore, I was wondering if I can also use your tool if my vcf file was created using the GRCh37 reference genome.
Cheers!