egeulgen / pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks
https://egeulgen.github.io/pathfindR/
Other
178 stars 25 forks source link

Error "p values must all be between 0 and 1" when using scientific notation #25

Closed Adrianzo closed 4 years ago

Adrianzo commented 4 years ago

Describe the bug

I've this dataframe with P-values in scientific notation, like "1e-3".

To Reproduce

run_pathfindR(data.frame(chromosome = c(1,2), 
                         position = c(100,200), 
                         ref = c("G","T"), 
                         alt = c("A","C"), 
                         P.Value = c(1e-3, 1e-5)))

Throws the error:

## Testing input
Error in pathfindR::input_testing(input, p_val_threshold) : 
  p values must all be between 0 and 1

Expected behavior

No error.

R Session Information:

R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS/LAPACK: /home/username/.conda/envs/jupy/lib/libopenblasp-r0.3.7.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.1

Additional context

❯ java -version
openjdk version "11.0.1-internal" 2018-10-16
OpenJDK Runtime Environment (build 11.0.1-internal+0-adhoc..src)
OpenJDK 64-Bit Server VM (build 11.0.1-internal+0-adhoc..src, mixed mode)
egeulgen commented 4 years ago

(Currently) pathfindR only takes in genes with associated p values (and optionally change values).

As stated in README (the main page of this repo):

This workflow takes in a data frame consisting of “gene symbols”, “change values” (optional) and “associated p values”:

Gene_symbol logFC FDR_p
FAM110A -0.69 3.4e-06
RNASE2 1.35 1.0e-05
S100A8 1.54 3.5e-05
S100A9 1.03 2.3e-04

With the data frame you provide as input (I'm assuming SNP data), pathfindR checks the second column (which is actually a minor bug that was fixed thanks to this issue) which is c(100, 200) and complains that these are not in [0,1].

To sum up, pathfindR cannot use chromosomal position data as input (at least not yet) so please use the above-mentioned genes table format.