WangLab-MSSM / DreamAI

Imputation of missing values of a matrix or data.frame using iterative prediction model
Apache License 2.0
31 stars 8 forks source link

ImputeADMIN #3

Open medibuntu opened 3 years ago

medibuntu commented 3 years ago

Hi,

first, thank you for providing this great tool!

I have a LFQ dataset of 175 samples x 5787 proteins, log2 transformed protein abundance with ~ 6% missing values (I've filtered out proteins with missingness > 50% and samples with missingness >60%).

When I try to run the imputation, ADMIN will always crash with

Error in while ((iter < maxiter_ADMIN) & (diff > tol)) { : missing value where TRUE/FALSE needed

I use

`dreamaiimpute <- DreamAI_Bagging( protein_quant_unimputed_wide_mat, SamplesPerBatch = 1, n.bag = 3, method = c("ADMIN"), gamma_ADMIN = NA, maxiter_ADMIN = 100

)`

annapamma commented 3 years ago

Hi,

Thank you for using this tool and reporting this issue! I've been able to reproduce this issue and we are investigating it! I will keep the issue open in the meantime.

annapamma commented 3 years ago

Hi @medibuntu -- are you using log ratio abundance data? If so, can you test this with gamma_ADMIN=0?

Thank you!

medibuntu commented 3 years ago

Hi,

thank you for your support. I tested both parameters and both threw me said error. I now did a bagged rund w/o Admin and after a fairly long processing time, I got this

Error in sprintf("%03d", ProcessNum) : argument "ProcessNum" is missing, with no default

I run the analyses on a AWS t3.micro

R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.5 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] impute_1.64.0 Rcpp_1.0.5 DreamAI_0.1.0 glmnet_4.1
[5] Matrix_1.2-18 missForest_1.4 itertools_0.1-3 iterators_1.0.12
[9] foreach_1.5.0 randomForest_4.6-14 survival_3.1-12 cluster_2.1.0
[13] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.1 purrr_0.3.4
[17] readr_1.3.1 tidyr_1.1.1 tibble_3.0.3 ggplot2_3.3.2
[21] tidyverse_1.3.0

loaded via a namespace (and not attached): [1] shape_1.4.5 tidyselect_1.1.0 splines_4.0.2 haven_2.3.1 lattice_0.20-41 [6] colorspace_1.4-1 vctrs_0.3.2 generics_0.0.2 blob_1.2.1 rlang_0.4.7
[11] pillar_1.4.6 glue_1.4.1 withr_2.2.0 DBI_1.1.0 dbplyr_1.4.4
[16] modelr_0.1.8 readxl_1.3.1 lifecycle_0.2.0 munsell_0.5.0 gtable_0.3.0
[21] cellranger_1.1.0 rvest_0.3.6 codetools_0.2-16 parallel_4.0.2 fansi_0.4.1
[26] broom_0.7.0 scales_1.1.1 backports_1.1.8 jsonlite_1.7.0 fs_1.5.0
[31] hms_0.5.3 stringi_1.4.6 grid_4.0.2 cli_2.0.2 tools_4.0.2
[36] magrittr_1.5 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.1 xml2_1.3.2
[41] reprex_0.3.0 lubridate_1.7.9 assertthat_0.2.1 httr_1.4.2 rstudioapi_0.11 [46] R6_2.4.1 compiler_4.0.2

verheytb commented 3 years ago

Hi,

Any updates or workarounds on this issue? I am having the same problem. Other algorithms are fine except for ADMIN which is throwing this error for me. I am using a TMT proteomics dataset without any log-transformation, subset for proteins with <50% missing values.

annapamma commented 3 years ago

Hi @verheytb does your data have both positive and negative values(e.g. ratio data)? If so, can you try setting gamma= 0 for admin and let us know how that goes?

verheytb commented 3 years ago

I am using summed SNR/intensity data at the protein level. I have also independently confirmed that the data is all non-negative.

pacificma commented 3 years ago

how is the overall scale of your data what about the 0 proportions in the data? did you get the same error by setting gamma_ADMIN = 0?

verheytb commented 3 years ago

I was able to resolve the issue by removing some values that were 0 (presumably from rounding error). I also had to remove the bridge samples from TMT data where expression was set to 1.

pacificma commented 3 years ago

I was able to resolve the issue by removing some values that were 0 (presumably from rounding error). I also had to remove the bridge samples from TMT data where expression was set to 1.

did you try 'gamma_ADMIN = 0' yet? also large number of 0s might be a issue which is not a regular pattern of proteomics data.

verheytb commented 3 years ago

It was only a handful of zeros. I have not tried that parameter you are suggesting, but I will if I get this error again.