Richard-Packer / DeepPheWAS

An R package for phenotype generation and association testing for phenome wide associations studies (PheWAS)
GNU General Public License v3.0
10 stars 6 forks source link

Handling duplicated columns in tab data files #2

Closed mikkmart closed 1 year ago

mikkmart commented 1 year ago

We have our tab data spread across multiple files. However, the files have some overlap in the columns that they include. Currently, minimum_data_R() combines data from multiple files with a dplyr::full_join(), which causes duplicate columns to be suffixed with .x and .y respectively. As a result, the columns are essentially missing downstream, where such suffixes aren't expected.

Could minimum_data_R() be updated to be a bit smarter about this, and in the presence of duplicate columns for example use the one from the file specified last in the list?