rjcorb commented 1 year ago

There are currently duplicated column names in the output of 01-annotate_variants_*_input.R. This happens when intervar and multianno files are merged:

> names(clinvar_anno_intervar_vcf_df)
 [1] "Ref.Gene"                        "Func.refGene.x"                  "ExonicFunc.refGene.x"            "Gene.ensGene.x"                  "avsnp147.x"                     
 [6] "AAChange.ensGene.x"              "AAChange.refGene.x"              "InterVar: InterVar and Evidence" "Interpro_domain.x"               "AAChange.knownGene.x"           
[11] "Otherinfo"                       "var_id"                          "Start"                           "Func.refGene.y"                  "Gene.refGene"                   
[16] "GeneDetail.refGene"              "ExonicFunc.refGene.y"            "AAChange.refGene.y"              "esp6500siv2_all"                 "1000g2015aug_all"               
[21] "avsnp147.y"                      "Aloft_Confidence"                "integrated_confidence_value"     "LINSIGHT"                        "GERP++_NR"                      
[26] "GERP++_RS"                       "SiPhy_29way_logOdds"             "Interpro_domain.y"               "rmsk"                            "Func.ensGene"                   
[31] "Gene.ensGene.y"                  "GeneDetail.ensGene"              "ExonicFunc.ensGene"              "AAChange.ensGene.y"              "Func.knownGene"                 
[36] "Gene.knownGene"                  "GeneDetail.knownGene"            "ExonicFunc.knownGene"            "AAChange.knownGene.y"            "vcf_id"                         
[41] "evidencePVS1"                    "evidenceBA1"                     "evidencePS"                      "evidencePM"                      "evidencePP"                     
[46] "evidenceBS"                      "evidenceBP"                      "CHROM"                           "START"                           "ID"                             
[51] "REF"                             "ALT"                             "QUAL"                            "FILTER"                          "INFO"                           
[56] "FORMAT"                          "Sample"                          "Stars"                           "final_call"   

This likely was overlooked when we modified how the two data frames are merged.

> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] vroom_1.6.0     optparse_1.7.3  lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.1     purrr_1.0.1     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] pillar_1.9.0     compiler_4.2.3   tools_4.2.3      bit_4.0.5        lifecycle_1.0.3  gtable_0.3.3     timechange_0.2.0 pkgconfig_2.0.3  rlang_1.1.0      cli_3.6.1        rstudioapi_0.14  parallel_4.2.3  
[13] withr_2.5.0      generics_0.1.3   vctrs_0.6.2      hms_1.1.3        getopt_1.20.3    rprojroot_2.0.3  bit64_4.0.5      grid_4.2.3       tidyselect_1.2.0 glue_1.6.2       R6_2.5.1         fansi_1.0.4     
[25] tzdb_0.3.0       magrittr_2.0.3   scales_1.2.1     colorspace_2.1-0 utf8_1.2.3       stringi_1.7.12   munsell_0.5.0    crayon_1.5.2