andrewrech / antigen.garnish

Other
45 stars 13 forks source link

garnish_affinity ; data.table error, function breaks after it starts extracting cDNA changes #141

Closed ARChakravarthy closed 3 years ago

ARChakravarthy commented 3 years ago

garnish_affinity keeps breaking with the following error message for both a vcf parsed using garnish_variants, and a sample_id, transcript, cDNA change, and HLA annotations. The pipeline works fine for the test data included with the package, for context.

"Error in [.data.table(dt, , :=(cDNA_type, cDNA_change %>% stringr::str_extract_all("[a-z]{3}|>") %>% : The items in the 'by' or 'keyby' list are length(s) (2). Each must be length 0; the same length as there are rows in x (after subsetting if i is provided)."

Dput calls for the input objects follow

SNV2 <- structure(list(CHROM = c("chr1", "chr1", "chr1", "chr1", "chr1" ), POS = c("2146093", "2146093", "2146093", "2146093", "2146093" ), ID = c("chr1:2077532_G/A", "chr1:2077532_G/A", "chr1:2077532_G/A", "chr1:2077532_G/A", "chr1:2077532_G/A"), REF = c("G", "G", "G", "G", "G"), ALT = c("A", "A", "A", "A", "A"), QUAL = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter), FILTER = c("PASS", "PASS", "PASS", "PASS", "PASS"), INFO = c("AS_FilterStatus=SITE;AS_SB_TABLE=126;DP=286;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=117,118;MMQ=60,60;MPOS=30;NALOD=1.9;NLOD=23.48;POPAF=6;ROQ=61;TLOD=37.43;ANN=A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000378567|protein_coding|7/18|c.619G>A|p.Glu207Lys|780/2326|619/1779|207/592||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000468310|protein_coding|6/6|c.529G>A|p.Glu177Lys|675/690|529/544|177/180||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400921|protein_coding|4/15|c.70G>A|p.Glu24Lys|753/2294|70/1230|24/409||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461106|protein_coding|4/15|c.307G>A|p.Glu103Lys|575/1916|307/1467|103/488||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461465|protein_coding|4/6|c.70G>A|p.Glu24Lys|365/480|70/185|24/60||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470596|protein_coding|4/6|c.70G>A|p.Glu24Lys|564/788|70/294|24/97||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000496325|protein_coding|4/6|c.70G>A|p.Glu24Lys|421/563|70/212|24/69||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000482686|protein_coding|4/6|c.70G>A|p.Glu24Lys|465/614|70/219|24/72||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400920|protein_coding|4/15|c.70G>A|p.Glu24Lys|465/2009|70/1230|24/409||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000486681|protein_coding|5/8|c.58G>A|p.Glu20Lys|616/966|58/408|20/135||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470986|protein_coding|4/6|c.70G>A|p.Glu24Lys|449/706|70/327|24/108||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470511|protein_coding|4/6|c.70G>A|p.Glu24Lys|391/567|70/246|24/81||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000471018|protein_coding|4/5|c.70G>A|p.Glu24Lys|648/716|70/138|24/45||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000466352|protein_coding|4/4|c.70G>A|p.Glu24Lys|559/564|70/75|24/24||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000497183|protein_coding|5/7|c.58G>A|p.Glu20Lys|505/763|58/316|20/104||WARNING_TRANSCRIPT_INCOMPLETE,A|3_prime_UTR_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000481140|nonsense_mediated_decay|4/6|n.203G>A|||||10750|,A|upstream_gene_variant|MODIFIER|RP5-892K4.1|ENSG00000271806|transcript|ENST00000606533|antisense||n.-1C>T|||||814|,A|downstream_gene_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000495347|protein_coding||c.698G>A|||||7|WARNING_TRANSCRIPT_NO_STOP_CODON,A|intron_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000419838|processed_transcript|2/3|n.218+1752G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000481140|nonsense_mediated_decay|4/6|n.203G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000479263|processed_transcript|2/13|n.201G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000503672|retained_intron|2/5|n.222G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000478770|nonsense_mediated_decay|2/14|n.58G>A||||||", "AS_FilterStatus=SITE;AS_SB_TABLE=126;DP=286;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=117,118;MMQ=60,60;MPOS=30;NALOD=1.9;NLOD=23.48;POPAF=6;ROQ=61;TLOD=37.43;ANN=A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000378567|protein_coding|7/18|c.619G>A|p.Glu207Lys|780/2326|619/1779|207/592||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000468310|protein_coding|6/6|c.529G>A|p.Glu177Lys|675/690|529/544|177/180||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400921|protein_coding|4/15|c.70G>A|p.Glu24Lys|753/2294|70/1230|24/409||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461106|protein_coding|4/15|c.307G>A|p.Glu103Lys|575/1916|307/1467|103/488||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461465|protein_coding|4/6|c.70G>A|p.Glu24Lys|365/480|70/185|24/60||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470596|protein_coding|4/6|c.70G>A|p.Glu24Lys|564/788|70/294|24/97||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000496325|protein_coding|4/6|c.70G>A|p.Glu24Lys|421/563|70/212|24/69||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000482686|protein_coding|4/6|c.70G>A|p.Glu24Lys|465/614|70/219|24/72||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400920|protein_coding|4/15|c.70G>A|p.Glu24Lys|465/2009|70/1230|24/409||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000486681|protein_coding|5/8|c.58G>A|p.Glu20Lys|616/966|58/408|20/135||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470986|protein_coding|4/6|c.70G>A|p.Glu24Lys|449/706|70/327|24/108||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470511|protein_coding|4/6|c.70G>A|p.Glu24Lys|391/567|70/246|24/81||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000471018|protein_coding|4/5|c.70G>A|p.Glu24Lys|648/716|70/138|24/45||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000466352|protein_coding|4/4|c.70G>A|p.Glu24Lys|559/564|70/75|24/24||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000497183|protein_coding|5/7|c.58G>A|p.Glu20Lys|505/763|58/316|20/104||WARNING_TRANSCRIPT_INCOMPLETE,A|3_prime_UTR_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000481140|nonsense_mediated_decay|4/6|n.203G>A|||||10750|,A|upstream_gene_variant|MODIFIER|RP5-892K4.1|ENSG00000271806|transcript|ENST00000606533|antisense||n.-1C>T|||||814|,A|downstream_gene_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000495347|protein_coding||c.698G>A|||||7|WARNING_TRANSCRIPT_NO_STOP_CODON,A|intron_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000419838|processed_transcript|2/3|n.218+1752G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000481140|nonsense_mediated_decay|4/6|n.203G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000479263|processed_transcript|2/13|n.201G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000503672|retained_intron|2/5|n.222G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000478770|nonsense_mediated_decay|2/14|n.58G>A||||||", "AS_FilterStatus=SITE;AS_SB_TABLE=126;DP=286;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=117,118;MMQ=60,60;MPOS=30;NALOD=1.9;NLOD=23.48;POPAF=6;ROQ=61;TLOD=37.43;ANN=A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000378567|protein_coding|7/18|c.619G>A|p.Glu207Lys|780/2326|619/1779|207/592||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000468310|protein_coding|6/6|c.529G>A|p.Glu177Lys|675/690|529/544|177/180||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400921|protein_coding|4/15|c.70G>A|p.Glu24Lys|753/2294|70/1230|24/409||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461106|protein_coding|4/15|c.307G>A|p.Glu103Lys|575/1916|307/1467|103/488||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461465|protein_coding|4/6|c.70G>A|p.Glu24Lys|365/480|70/185|24/60||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470596|protein_coding|4/6|c.70G>A|p.Glu24Lys|564/788|70/294|24/97||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000496325|protein_coding|4/6|c.70G>A|p.Glu24Lys|421/563|70/212|24/69||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000482686|protein_coding|4/6|c.70G>A|p.Glu24Lys|465/614|70/219|24/72||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400920|protein_coding|4/15|c.70G>A|p.Glu24Lys|465/2009|70/1230|24/409||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000486681|protein_coding|5/8|c.58G>A|p.Glu20Lys|616/966|58/408|20/135||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470986|protein_coding|4/6|c.70G>A|p.Glu24Lys|449/706|70/327|24/108||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470511|protein_coding|4/6|c.70G>A|p.Glu24Lys|391/567|70/246|24/81||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000471018|protein_coding|4/5|c.70G>A|p.Glu24Lys|648/716|70/138|24/45||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000466352|protein_coding|4/4|c.70G>A|p.Glu24Lys|559/564|70/75|24/24||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000497183|protein_coding|5/7|c.58G>A|p.Glu20Lys|505/763|58/316|20/104||WARNING_TRANSCRIPT_INCOMPLETE,A|3_prime_UTR_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000481140|nonsense_mediated_decay|4/6|n.203G>A|||||10750|,A|upstream_gene_variant|MODIFIER|RP5-892K4.1|ENSG00000271806|transcript|ENST00000606533|antisense||n.-1C>T|||||814|,A|downstream_gene_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000495347|protein_coding||c.698G>A|||||7|WARNING_TRANSCRIPT_NO_STOP_CODON,A|intron_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000419838|processed_transcript|2/3|n.218+1752G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000481140|nonsense_mediated_decay|4/6|n.203G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000479263|processed_transcript|2/13|n.201G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000503672|retained_intron|2/5|n.222G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000478770|nonsense_mediated_decay|2/14|n.58G>A||||||", "AS_FilterStatus=SITE;AS_SB_TABLE=126;DP=286;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=117,118;MMQ=60,60;MPOS=30;NALOD=1.9;NLOD=23.48;POPAF=6;ROQ=61;TLOD=37.43;ANN=A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000378567|protein_coding|7/18|c.619G>A|p.Glu207Lys|780/2326|619/1779|207/592||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000468310|protein_coding|6/6|c.529G>A|p.Glu177Lys|675/690|529/544|177/180||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400921|protein_coding|4/15|c.70G>A|p.Glu24Lys|753/2294|70/1230|24/409||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461106|protein_coding|4/15|c.307G>A|p.Glu103Lys|575/1916|307/1467|103/488||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461465|protein_coding|4/6|c.70G>A|p.Glu24Lys|365/480|70/185|24/60||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470596|protein_coding|4/6|c.70G>A|p.Glu24Lys|564/788|70/294|24/97||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000496325|protein_coding|4/6|c.70G>A|p.Glu24Lys|421/563|70/212|24/69||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000482686|protein_coding|4/6|c.70G>A|p.Glu24Lys|465/614|70/219|24/72||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400920|protein_coding|4/15|c.70G>A|p.Glu24Lys|465/2009|70/1230|24/409||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000486681|protein_coding|5/8|c.58G>A|p.Glu20Lys|616/966|58/408|20/135||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470986|protein_coding|4/6|c.70G>A|p.Glu24Lys|449/706|70/327|24/108||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470511|protein_coding|4/6|c.70G>A|p.Glu24Lys|391/567|70/246|24/81||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000471018|protein_coding|4/5|c.70G>A|p.Glu24Lys|648/716|70/138|24/45||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000466352|protein_coding|4/4|c.70G>A|p.Glu24Lys|559/564|70/75|24/24||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000497183|protein_coding|5/7|c.58G>A|p.Glu20Lys|505/763|58/316|20/104||WARNING_TRANSCRIPT_INCOMPLETE,A|3_prime_UTR_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000481140|nonsense_mediated_decay|4/6|n.203G>A|||||10750|,A|upstream_gene_variant|MODIFIER|RP5-892K4.1|ENSG00000271806|transcript|ENST00000606533|antisense||n.-1C>T|||||814|,A|downstream_gene_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000495347|protein_coding||c.698G>A|||||7|WARNING_TRANSCRIPT_NO_STOP_CODON,A|intron_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000419838|processed_transcript|2/3|n.218+1752G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000481140|nonsense_mediated_decay|4/6|n.203G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000479263|processed_transcript|2/13|n.201G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000503672|retained_intron|2/5|n.222G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000478770|nonsense_mediated_decay|2/14|n.58G>A||||||", "AS_FilterStatus=SITE;AS_SB_TABLE=126;DP=286;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=117,118;MMQ=60,60;MPOS=30;NALOD=1.9;NLOD=23.48;POPAF=6;ROQ=61;TLOD=37.43;ANN=A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000378567|protein_coding|7/18|c.619G>A|p.Glu207Lys|780/2326|619/1779|207/592||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000468310|protein_coding|6/6|c.529G>A|p.Glu177Lys|675/690|529/544|177/180||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400921|protein_coding|4/15|c.70G>A|p.Glu24Lys|753/2294|70/1230|24/409||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461106|protein_coding|4/15|c.307G>A|p.Glu103Lys|575/1916|307/1467|103/488||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461465|protein_coding|4/6|c.70G>A|p.Glu24Lys|365/480|70/185|24/60||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470596|protein_coding|4/6|c.70G>A|p.Glu24Lys|564/788|70/294|24/97||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000496325|protein_coding|4/6|c.70G>A|p.Glu24Lys|421/563|70/212|24/69||WARNING_TRANSCRIPT_INCOMPLETE,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000482686|protein_coding|4/6|c.70G>A|p.Glu24Lys|465/614|70/219|24/72||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400920|protein_coding|4/15|c.70G>A|p.Glu24Lys|465/2009|70/1230|24/409||,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000486681|protein_coding|5/8|c.58G>A|p.Glu20Lys|616/966|58/408|20/135||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470986|protein_coding|4/6|c.70G>A|p.Glu24Lys|449/706|70/327|24/108||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000470511|protein_coding|4/6|c.70G>A|p.Glu24Lys|391/567|70/246|24/81||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000471018|protein_coding|4/5|c.70G>A|p.Glu24Lys|648/716|70/138|24/45||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000466352|protein_coding|4/4|c.70G>A|p.Glu24Lys|559/564|70/75|24/24||WARNING_TRANSCRIPT_NO_STOP_CODON,A|missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000497183|protein_coding|5/7|c.58G>A|p.Glu20Lys|505/763|58/316|20/104||WARNING_TRANSCRIPT_INCOMPLETE,A|3_prime_UTR_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000481140|nonsense_mediated_decay|4/6|n.203G>A|||||10750|,A|upstream_gene_variant|MODIFIER|RP5-892K4.1|ENSG00000271806|transcript|ENST00000606533|antisense||n.-1C>T|||||814|,A|downstream_gene_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000495347|protein_coding||c.698G>A|||||7|WARNING_TRANSCRIPT_NO_STOP_CODON,A|intron_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000419838|processed_transcript|2/3|n.218+1752G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000481140|nonsense_mediated_decay|4/6|n.203G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000479263|processed_transcript|2/13|n.201G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000503672|retained_intron|2/5|n.222G>A||||||,A|non_coding_exon_variant|MODIFIER|PRKCZ|ENSG00000067606|transcript|ENST00000478770|nonsense_mediated_decay|2/14|n.58G>A||||||" ), AS_FilterStatus = c("SITE", "SITE", "SITE", "SITE", "SITE" ), AS_SB_TABLE = c("=126", "=126", "=126", "=126", "=126" ), DP = c("=286", "=286", "=286", "=286", "=286"), ECNT = c("=1", "=1", "=1", "=1", "=1"), GERMQ = c("=93", "=93", "=93", "=93", "=93"), MBQ = c("=20,20", "=20,20", "=20,20", "=20,20", "=20,20" ), MFRL = c("=117,118", "=117,118", "=117,118", "=117,118", "=117,118"), MMQ = c("=60,60", "=60,60", "=60,60", "=60,60", "=60,60"), MPOS = c("=30", "=30", "=30", "=30", "=30"), NALOD = c("=1.9", "=1.9", "=1.9", "=1.9", "=1.9"), NLOD = c("=23.48", "=23.48", "=23.48", "=23.48", "=23.48"), POPAF = c("=6", "=6", "=6", "=6", "=6"), ROQ = c("=61", "=61", "=61", "=61", "=61"), TLOD = c("=37.43", "=37.43", "=37.43", "=37.43", "=37.43" ), ANN = c("missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000378567|protein_coding|7/18|c.619G>A|p.Glu207Lys|780/2326|619/1779|207/592||", "missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000468310|protein_coding|6/6|c.529G>A|p.Glu177Lys|675/690|529/544|177/180||WARNING_TRANSCRIPT_INCOMPLETE", "missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000400921|protein_coding|4/15|c.70G>A|p.Glu24Lys|753/2294|70/1230|24/409||", "missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461106|protein_coding|4/15|c.307G>A|p.Glu103Lys|575/1916|307/1467|103/488||", "missense_variant|MODERATE|PRKCZ|ENSG00000067606|transcript|ENST00000461465|protein_coding|4/6|c.70G>A|p.Glu24Lys|365/480|70/185|24/60||WARNING_TRANSCRIPT_INCOMPLETE" ), LOF = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter), NMD = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter), RPA = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter), RU = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter ), STR = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter), STRQ = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter), ReverseComplementedAlleles = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter ), FORMAT = c("GT:AD:AF:DP:F1R2:F2R1:SB", "GT:AD:AF:DP:F1R2:F2R1:SB", "GT:AD:AF:DP:F1R2:F2R1:SB", "GT:AD:AF:DP:F1R2:F2R1:SB", "GT:AD:AF:DP:F1R2:F2R1:SB" ), Seq10E = c("0/1:116,21:0.164:137:50,12:63,9:57,59,11,10", "0/1:116,21:0.164:137:50,12:63,9:57,59,11,10", "0/1:116,21:0.164:137:50,12:63,9:57,59,11,10", "0/1:116,21:0.164:137:50,12:63,9:57,59,11,10", "0/1:116,21:0.164:137:50,12:63,9:57,59,11,10" ), Seq10NT = c("0/0:145,0:0.012:145:64,0:80,0:69,76,0,0", "0/0:145,0:0.012:145:64,0:80,0:69,76,0,0", "0/0:145,0:0.012:145:64,0:80,0:69,76,0,0", "0/0:145,0:0.012:145:64,0:80,0:69,76,0,0", "0/0:145,0:0.012:145:64,0:80,0:69,76,0,0" ), Seq10E_GT = c("0/1", "0/1", "0/1", "0/1", "0/1"), Seq10E_AF = c("0.164", "0.164", "0.164", "0.164", "0.164"), Seq10E_DP = c("137", "137", "137", "137", "137"), Seq10E_F1R2 = c("50,12", "50,12", "50,12", "50,12", "50,12"), Seq10E_F2R1 = c("63,9", "63,9", "63,9", "63,9", "63,9"), Seq10E_SB = c("57,59,11,10", "57,59,11,10", "57,59,11,10", "57,59,11,10", "57,59,11,10"), Seq10E_PGT = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter ), Seq10E_PID = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter), Seq10E_PS = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter ), Seq10NT_GT = c("0/0", "0/0", "0/0", "0/0", "0/0"), Seq10NT_AF = c("0.012", "0.012", "0.012", "0.012", "0.012"), Seq10NT_DP = c("145", "145", "145", "145", "145"), Seq10NT_F1R2 = c("64,0", "64,0", "64,0", "64,0", "64,0"), Seq10NT_F2R1 = c("80,0", "80,0", "80,0", "80,0", "80,0"), Seq10NT_SB = c("69,76,0,0", "69,76,0,0", "69,76,0,0", "69,76,0,0", "69,76,0,0"), Seq10NT_PGT = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter ), Seq10NT_PID = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter), Seq10NT_PS = c(NAcharacter, NAcharacter, NAcharacter, NAcharacter, NAcharacter ), Seq10E_AD_ref = c("116", "116", "116", "116", "116"), Seq10E_AD_alt = c("21", "21", "21", "21", "21"), Seq10NT_AD_ref = c("145", "145", "145", "145", "145"), Seq10NT_AD_alt = c("0", "0", "0", "0", "0"), sample_id = c("10E.vcfhg38_SnpEff_out.vcf", "10E.vcfhg38_SnpEff_out.vcf", "10E.vcfhg38_SnpEff_out.vcf", "10E.vcfhg38_SnpEff_out.vcf", "10E.vcfhg38_SnpEff_out.vcf" ), vcf_type = c("Mutect", "Mutect", "Mutect", "Mutect", "Mutect" ), snpeff_uuid = c("df44d06d-6233-48be-bb38-093962ba8b2b", "df44d06d-6233-48be-bb38-093962ba8b2b", "df44d06d-6233-48be-bb38-093962ba8b2b", "df44d06d-6233-48be-bb38-093962ba8b2b", "df44d06d-6233-48be-bb38-093962ba8b2b" ), transcript_id = c("ENST00000378567", "ENST00000468310", "ENST00000400921", "ENST00000461106", "ENST00000461465"), effect_type = c("missense_variant", "missense_variant", "missense_variant", "missense_variant", "missense_variant"), putative_impact = c("MODERATE", "MODERATE", "MODERATE", "MODERATE", "MODERATE"), gene = c("PRKCZ", "PRKCZ", "PRKCZ", "PRKCZ", "PRKCZ"), gene_id = c("ENSG00000067606", "ENSG00000067606", "ENSG00000067606", "ENSG00000067606", "ENSG00000067606"), feature_type = c("transcript", "transcript", "transcript", "transcript", "transcript"), feature_id = c("ENST00000378567", "ENST00000468310", "ENST00000400921", "ENST00000461106", "ENST00000461465"), transcript_bioptype = c("protein_coding", "protein_coding", "protein_coding", "protein_coding", "protein_coding" ), exon_intron_rank = c("7/18", "6/6", "4/15", "4/15", "4/6" ), cDNA_change = c("c.619G>A", "c.529G>A", "c.70G>A", "c.307G>A", "c.70G>A"), protein_change = c("p.Glu207Lys", "p.Glu177Lys", "p.Glu24Lys", "p.Glu103Lys", "p.Glu24Lys"), cDNA_position_cDNA_len = c("780/2326", "675/690", "753/2294", "575/1916", "365/480"), CDS_position_CDS_len = c("619/1779", "529/544", "70/1230", "307/1467", "70/185"), Protein_position_Protein_len = c("207/592", "177/180", "24/409", "103/488", "24/60"), Distance_to_feature = c("", "", "", "", ""), ERRORS_WARNINGS_INFO = c(NA, "WARNING_TRANSCRIPT_INCOMPLETE", NA, NA, "WARNING_TRANSCRIPT_INCOMPLETE"), protein_coding = c(TRUE, TRUE, TRUE, TRUE, TRUE), MHC = c("HLA-A03:01 HLA-A33:01 HLA-B07:02 HLA-B14:02 HLA-C07:02 HLA-C08:02", "HLA-A03:01 HLA-A33:01 HLA-B07:02 HLA-B14:02 HLA-C07:02 HLA-C08:02", "HLA-A03:01 HLA-A33:01 HLA-B07:02 HLA-B14:02 HLA-C07:02 HLA-C08:02", "HLA-A03:01 HLA-A33:01 HLA-B07:02 HLA-B14:02 HLA-C07:02 HLA-C08:02", "HLA-A03:01 HLA-A33:01 HLA-B07:02 HLA-B14:02 HLA-C07:02 HLA-C*08:02" ), frameshift = c(FALSE, FALSE, FALSE, FALSE, FALSE)), row.names = c(NA, -5L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x557b3b1a12b0>)

Tabular data was then generated from the VCFs

SNV3 <- SNV2%>%mutate(sample_id = "S1")%>%select(sample_id,transcript_id = feature_id, cDNA_change , MHC )

Dput result for SNV3

structure(list(sample_id = c("S1", "S1", "S1", "S1", "S1"), transcript_id = c("ENST00000378567", "ENST00000468310", "ENST00000400921", "ENST00000461106", "ENST00000461465" ), cDNA_change = c("c.619G>A", "c.529G>A", "c.70G>A", "c.307G>A", "c.70G>A"), MHC = c("HLA-A03:01 HLA-A33:01 HLA-B07:02 HLA-B14:02 HLA-C07:02 HLA-C08:02", "HLA-A03:01 HLA-A33:01 HLA-B07:02 HLA-B14:02 HLA-C07:02 HLA-C08:02", "HLA-A03:01 HLA-A33:01 HLA-B07:02 HLA-B14:02 HLA-C07:02 HLA-C08:02", "HLA-A03:01 HLA-A33:01 HLA-B07:02 HLA-B14:02 HLA-C07:02 HLA-C08:02", "HLA-A03:01 HLA-A33:01 HLA-B07:02 HLA-B14:02 HLA-C07:02 HLA-C08:02" )), row.names = c(NA, -5L), class = c("data.table", "data.frame" ), .internal.selfref = <pointer: 0x557b3b1a12b0>)

Calls

x <- garnish_affinity(SNV2) y <- garnish_affinity(dt = data.table(SNV3))

Both these calls produce the error message I posted at the start

SessionInfo

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8
[6] LC_MESSAGES=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] magrittr_2.0.1 dplyr_1.0.7 data.table_1.14.0 antigen.garnish_2.2.0

loaded via a namespace (and not attached): [1] zoo_1.8-9 tidyselect_1.1.1 xfun_0.24 memuse_4.1-0 purrr_0.3.4 splines_4.0.4
[7] lattice_0.20-41 vctrs_0.3.8 generics_0.1.0 testthat_3.0.4 htmltools_0.5.1.1 stats4_4.0.4
[13] viridisLite_0.4.0 vcfR_1.12.0 yaml_2.2.1 mgcv_1.8-34 utf8_1.2.2 rlang_0.4.11
[19] pillar_1.6.1 glue_1.4.2 BiocGenerics_0.36.1 uuid_0.1-4 lifecycle_1.0.0 stringr_1.4.0
[25] zlibbioc_1.36.0 Biostrings_2.58.0 evaluate_0.14 knitr_1.33 permute_0.9-5 IRanges_2.24.1
[31] parallel_4.0.4 fansi_0.5.0 Rcpp_1.0.7 pinfsc50_1.2.0 vegan_2.5-7 S4Vectors_0.28.1
[37] XVector_0.30.0 digest_0.6.27 stringi_1.7.3 rbibutils_2.2.1 grid_4.0.4 cli_3.0.1
[43] Rdpack_2.1.2 tools_4.0.4 tibble_3.1.3 cluster_2.1.1 crayon_1.4.1 ape_5.5
[49] tidyr_1.1.3 pkgconfig_2.0.3 ellipsis_0.3.2 MASS_7.3-53.1 Matrix_1.3-2 rmarkdown_2.9
[55] rstudioapi_0.13 R6_2.5.0 mclust_5.4.7 nlme_3.1-152 compiler_4.0.4

ARChakravarthy commented 3 years ago

For additional context - mutations were called with MuTect 2 on hg19, then lifted over to hg38 using Picard , and the samples were annotated using SnpEff with their cancer specific annotation (with a pedigree file).

leeprichman commented 3 years ago

It looks like the MHC alleles are malformed so the error is running because malformed MHC rows are removed from the table and this leads to attempting a data.table by operation on the vector 1:0 on an empty data.table. Please add an asterisk after A B or C to all of your MHC alleles. For example, HLA-A*03:01

ARChakravarthy commented 3 years ago

The HLA alleles of the objects I called dput on are correctly formed, and the error persists despite this. Please see attached screenshot

ErrorMessage

leeprichman commented 3 years ago

Looks like it was github markdown messing up the MHC alleles my apologies!

I took a look with your SNV3 table using the 2.1.1 docker image. The issue appears to be that no matches are found in the metadata file because your transcript IDs do not have version numbers appended to them unfortunately. This results in the empty data table at the extraction step throwing the error. You will need to get all possible version numbers for transcripts derived from your variants.

@andrewrech Can you update readme example table and documentation to reflect need for transcript version numbers?

ARChakravarthy commented 3 years ago

Hmm, so since SnpEff tends to not always append the version numbers to the output, would it be worth me creating a little function to basically do a left join on ENST identifiers with versioned ENST identifiers for inclusion in the package? I haven't contributed code to github repositories but I am happy to send along any code I develop by email so it can be a helper function.

leeprichman commented 3 years ago

Simply combining the transcript_id with all possible transcripts produced would not work because the cDNA change can vary based on the transcript. This must be done at the variant annotation level because the same alternate allele can have disparate effects on transcript coding sequences in different transcript versions. I'm not sure if the lack of versions as something to do with the hg19 liftover. @andrewrech might have some insight.

ARChakravarthy commented 3 years ago

That is down to the output of snpEff rather than the variant caller per se because transcript information is not present in the VCFs, merely the coordinates and the sequences at them ; I will have to check how to get snpEff to kick out versioned annotations with the preconfigured database they have for hg38 ; since antigen.garnish depends on SnpEff annotated VCFs it may be a critical issue. My understanding is that different transcripts for a gene (ENSG) have different numbers (ENST) , and Ensembl annotation, at least as implemented in SnpEff, gives you unversioned output.

leeprichman commented 3 years ago

Right, I'm referring to snpeff, the variant annotator not a variant caller. We have not historically had a problem with this, calling variants against hg38 and annotating with snpEff but we usually work with hg38 aligned data from the start.

andrewrech commented 3 years ago

SnpEff gives versioned output. This is why we use it. The version is critical because ~5% on transcript cDNAs change across versions. This may or may not be an acceptable potential rate of error

andrewrech commented 3 years ago

Whoops sorry did not mean to close this

ARChakravarthy commented 3 years ago

Can you tell me what versions of snpEff and the snpEff databases you've been using, and also if you go with the default standard vcf annotation (which prioritises canonical transcripts when it reports variants) , or with the traditional Eff format (-formatEff).

I've been using 4.1 with the reference being the pre-built snpEff database for GRCh38.103

andrewrech commented 3 years ago

Maybe that is too old? We test against this version from a few years ago. Think anything newer also gives versioned output AFAIK.

https://github.com/andrewrech/antigen.garnish/blob/main/inst/extdata/testdata/antigen.garnish_test.vcf#L3420-L3421

ARChakravarthy commented 3 years ago

Yup, tested with 4.3 and I can confirm that output is now versioned by default , unlike 4.1. Maybe documentation would benefit from a version requirement explicitly specified for snpEff (>/= 4.3)