Closed alex-berry closed 2 years ago
Please supply the list of UK Biobank field IDs from your phenofile, i.e. the header line. Also, please supply the header line for the trait of interest file.
Also, It looks like you are using a custom variable information file (variable_list_file.txt) – you should not need to create your own as there is one included in phesant [variable-info/outcome-info.tsv], although you are of course free to if you want to do something more custom. If you do need help running PHESANT with your custom variable info file, please also supply this file. The same applies for the data coding information file.
Is there any content in the results-log-all.txt file that PHESANT creates in the results directory?
If you are able to supply a minimal working example (with made up data) that would really help.
BW, Louise
louise.millard@bristol.ac.uk
Thanks for the quick reply. This was an issue with my using a small test dataset that only contained continuous variables. This caused PHESANT to output all blank files except for results-linear-all, which did contain the results. The additional packages weren't loaded because they weren't necessary to process the continuous variables. Sorry for the confusion.
The reason I have been using a custom variable information file is because the column names in my phenotype file are not formatted the same as the example by default. For example, where your age column is 'x21022_0_0', mine is 'age_at_recruitment_f21022_0_0', which is incompatible with the provided variable info file.
I'm curious if you have any suggestions on how to format the raw data to follow the x##### format. I use the R package 'ukbtools' to process the raw data from UKB, which gives the variables the names I show above, but even the raw .csv files provided by the UKB have a format different from the example here.
Thanks for your help and for making this great package.
To convert the ukb data into phesant format, extract from UKB as a CSV file, and then the following are example commands (from https://github.com/MRCIEU/PHESANT-MR-pheWAS-BMI/) that convert the headers:
head -n 1 ${origdir}data.21753.csv | sed 's/,"/,"x/g' | sed 's/-//g' | sed 's/.//g' > ${datadir}data.21753-phesant_header.csv awk '(NR>1) {print $0}' ${origdir}data.21753.csv >> ${datadir}data.21753-phesant_header.csv
The phenomeScan.R test dataset works fine for me, but when trying on my own dataset it runs without errors but only gets to "LOADING DONE" and then ends, outputting empty results files without ever loading the packages (MASS, lmtest, etc.) or running the stats. Has anyone run into this before?
Here is my complete output:
Rscript phenomeScan.r \ --phenofile="${workingDir}test_data.csv" \ --traitofinterestfile="${workingDir}test_data_bmi.csv" \ --variablelistfile="${workingDir}variable_list_file.txt" \ --traitofinterest="body_mass_index_bmi_f21001_0_0" \ --datacodingfile="${workingDir}data_coding_file.csv" \ --resDir="${workingDir}results/" \ --userId="eid" \ --genetic=FALSE
[1] "Running with all traits in phenotype file: /path/to/PHESANT/test_data.csv" [1] "Validating phenotype data ..." [1] "Number of columns in phenotype file: 7" [1] "Phenotype file validated" [1] "Validating trait of interest data ..." [1] "Number of columns in trait of interest file:2" [1] "Trait of interest file validated" [1] "Loading phenotypes ..." [1] "Loading trait of interest file ..." [1] "Loading confounders from phenotypes file ..." [1] "Adjusting for age and sex" [1] "Number of rows in confounder data: 197932" [1] "Number of INCOMPLETE rows removed from confounder data: 0" [1] "Number of rows in confounder data: 197932" [1] "Confounder columns:" [1] "userID" "x21022_0_0" "x31_0_0"
[1] "Phenotype file has 197932 rows with 197705 not NA for trait of interest (body_mass_index_bmi_f21001_0_0)." [1] "Phenotype and trait of interest data files merged, with 197705 examples" [1] "Loading indicator fields from phenotypes file ..." [1] "No required related variables." [1] "LOADING DONE"