Open mightyphil2000 opened 3 years ago
Hi,
I have the same issue all(is.numeric(out$pval)) is not TRUE
This is how I specify the columns:
x$determine_columns(list(chr_col="CHR",
snp_col="SNP",
pos_col="BP",
oa_col="ALLELE0",
ea_col="ALLELE1",
eaf_col="A1FREQ",
beta_col="BETA",
se_col="SE",
pval_col="P_BOLT_LMM_INF"))
Here it seems to be assigning the columns correctly:
Checking alleles are in A/C/T/G/D/I
0 variants with disallowed characters
Is this how the dataset should look?
tibble [100 × 9] (S3: tbl_df/tbl/data.frame)
$ chr : int [1:100] 1 1 1 1 1 1 1 1 1 1 ...
$ pos : int [1:100] 10177 10352 11008 11012 13110 13116 13118 13273 14464 14599 ...
$ ea : chr [1:100] "A" "T" "C" "C" ...
$ oa : chr [1:100] "AC" "TA" "G" "G" ...
$ beta: num [1:100] 0.003867 -0.000167 -0.003125 -0.003125 -0.001727 ...
$ se : num [1:100] 0.00408 0.00419 0.00701 0.00701 0.00929 ...
$ pval: num [1:100] 0.34 0.97 0.66 0.66 0.85 0.79 0.79 0.12 0.79 0.95 ...
$ snp : chr [1:100] "rs367896724" "rs201106462" "rs575272151" "rs544419019" ...
$ eaf : num [1:100] 0.602 0.607 0.914 0.914 0.941 ...
NULL
I think something is happening with the column order in the format function. In my file, the column order is not the same as input arguments in determine_columns
(understandably), so I specify them by column name (as above). This leads to the error.
However, if I re-order the columns in my original file to match the order of the arguments in the format_dataset
function and save it as a new file, and then try to run format_dataset
on this file, it works fine.
column order in the original file:
"CHR" , "BP", "SNP" , "BETA" , "SE" , "ALLELE1" , "ALLELE0", "A1FREQ" , "P_BOLT_LMM_INF"
reordered:
"CHR", "SNP", "BP", "ALLELE0", "ALLELE1", "A1FREQ", "BETA", "SE", "P_BOLT_LMM_INF"
for both I run the same x$determine_columns
as above.
So reordering the file before trying to upload is a workaround for now.
Hi @explodecomputer,
I think I found what is causing this issue (ignore my above investigation).
In determine_columns(), files in the format of IEU GWAS pipeline output are being read okay when rows=100 is specified (example 1).
However, when rows=Inf (inside format_dataset() function) it reads the pval column as <chr>
, not as <dbl>
(example2). I'm not sure why this happens.
(example1) $ P_BOLT_LMM <dbl> 0.400, 0.940, 0.740, 0.740, 0.790, 0.960,
(example2) $ P_BOLT_LMM <chr> "4.0E-01", "9.4E-01", "7.4E-01", "7.4E-01"
So the is.numeric() check fails.
My suggestions:
pval=as.numeric(a[[params$pval_col]])
or
I'm getting this error message when I try to specify columns in the dataset:
I've checked at the beta column is definitely all numeric.
If I specify the columns using column position I get a different error: