UofUEpiBio / PHS7045-advanced-programming

https://uofuepibio.github.io/PHS7045-advanced-programming
4 stars 5 forks source link

Final (May 2nd, 2023) #20

Closed gvegayon closed 3 months ago

gvegayon commented 1 year ago

Dear @UofUEpiBio/phs7045-2023, the instructions for the final are now posted here (source here). Like the midterm, the final will have two parts: A presentation and an R package. The deadlines are the following:

There won't be classes during the week previous to the presentation. The extension to the R packages has to be agreed during this week (April 17th).

gvegayon commented 1 year ago

Hey @UofUEpiBio/phs7045-2023, if your final is about a simulation study to analyze the properties of an estimator, section 5 of this paper may be useful for you. This is a large simulation study I ran using slurmR.

gvegayon commented 1 year ago

Example picking a parameter for simulation

We need to find coefficients in a logistic regression to match a desired proportion. Here is how we do it:

# Target proportion
target_prop <- .05

set.seed(3123)
n <- 1e4
X <- cbind(rnorm(n))

# Objective function (loss function)
fun <- function(x) {
  Y <- rbinom(100000, 1, prob = plogis(x[1] + X*x[2]))
  (mean(Y) - target_prop)^2
}

# Finding the closest value
res <- optim(par = c(0, 0), fn = fun)
res
#> $par
#> [1] -4.472115  1.985507
#> 
#> $value
#> [1] 9e-10
#> 
#> $counts
#> function gradient 
#>       71       NA 
#> 
#> $convergence
#> [1] 0
#> 
#> $message
#> NULL

# Are we getting there?
message(
  "Target is     : ", target_prop, "\n",
  "Proposed value: ", paste(res$par, collapse = ", "), "\n",
  "We got        : ", mean(rbinom(100000, 1, prob = plogis(res$par[1] + res$par[2] * X)))
)
#> Target is     : 0.05
#> Proposed value: -4.47211456298828, 1.98550653457642
#> We got        : 0.05035

Created on 2023-04-27 with reprex v2.0.2

gvegayon commented 1 year ago

Advanced example with classes

data("USArrests")

mymodel <- function(...) {
  res <- glm(...)
  structure(
    res,
    class = c("mymodel", class(res))
    )
}

ans <- mymodel(Murder ~ Assault + Rape, data = USArrests)

# There's no print method for "mymodel", but there is one for the next class
# which is "glm"
print(ans)
#> 
#> Call:  glm(formula = ..1, data = ..2)
#> 
#> Coefficients:
#> (Intercept)      Assault         Rape  
#>     0.41886      0.04003      0.02514  
#> 
#> Degrees of Freedom: 49 Total (i.e. Null);  47 Residual
#> Null Deviance:       929.6 
#> Residual Deviance: 330.3     AIC: 244.3

summary.mymodel <- function(object, ...) {

  ans <- stats::summary.glm(object)$coefficients

  structure(
    ans,
    message = "mymodel summary:\n",
    class = c("mymodel_summary", class(ans))
  )

}

print.mymodel_summary <- function(x, ...) {

  cat(attr(x, "message"))
  y <- x
  attr(y, "message") <- NULL
  class(y) <- class(x)[2]

  # Printing the coefficient only
  print(y)

  # But still returning invisibly the mymodel_summary object
  invisible(x)

}

# This calls summary.mymodel
ans_summary <- summary(ans)

# This calls print.mymodel_summary
ans_summary
#> mymodel summary:
#>               Estimate  Std. Error   t value     Pr(>|t|)
#> (Intercept) 0.41886275 0.976184443 0.4290816 6.698243e-01
#> Assault     0.04002885 0.006086735 6.5764086 3.593623e-08
#> Rape        0.02514178 0.054156886 0.4642398 6.446192e-01

# Since there's not subset method for mymodel_summary, it defaults to the
# next class, which is matrix:
ans_summary[,1,drop=FALSE]
#>               Estimate
#> (Intercept) 0.41886275
#> Assault     0.04002885
#> Rape        0.02514178

Created on 2023-04-27 with reprex v2.0.2

gvegayon commented 1 year ago

Example with Rcpp sugar and clone

Rcpp::sourceCpp(code = "
  #include <Rcpp.h>

  using namespace Rcpp;

  // [[Rcpp::export]]
  NumericMatrix sigmoid_cpp(NumericMatrix x) {

    // Notice I am using Rcpp::clone. With this, I am making sure that I get
    // a full copy of x. If I used this instead:
      //   NumericMatrix ans(x)
    // I would be getting a pointer to x.
    NumericMatrix ans = Rcpp::clone(x);
    for (size_t i = 0u; i < x.ncol(); ++i)
      ans.column(i) = 1.0/(1.0 + exp(-ans.column(i)));

    return ans;

  }
")

set.seed(223)
x <- matrix(rnorm(9), ncol = 3)
(sigmoid_cpp(x) - plogis(x))
#>      [,1] [,2] [,3]
#> [1,]    0    0    0
#> [2,]    0    0    0
#> [3,]    0    0    0

Created on 2023-04-27 with reprex v2.0.2

gvegayon commented 1 year ago

Example with RcppArmadillo

Rcpp::sourceCpp(code = "
  #include <RcppArmadillo.h>

  // [[Rcpp::depends(RcppArmadillo)]]

  using namespace Rcpp;

  // [[Rcpp::export]]
  arma::mat sigmoid_cpp(const arma::mat x) {

    return 1.0/(1.0 + arma::exp(-x));

  }
")

set.seed(223)
x <- matrix(rnorm(9), ncol = 3)
(sigmoid_cpp(x) - plogis(x))
#>      [,1] [,2] [,3]
#> [1,]    0    0    0
#> [2,]    0    0    0
#> [3,]    0    0    0

Created on 2023-04-27 with reprex v2.0.2

gvegayon commented 1 year ago

Example with RppArmadillo and Lists

Rcpp::sourceCpp(code = "
  #include <RcppArmadillo.h>

  // [[Rcpp::depends(RcppArmadillo)]]

  using namespace Rcpp;

  // [[Rcpp::export]]
  List sigmoid_cpp(const arma::mat x) {

    List res = List::create(
      _[\"ans\"] = 1.0/(1.0 + arma::exp(-x))
    );

    return res;

  }
")

set.seed(223)
x <- matrix(rnorm(9), ncol = 3)
(sigmoid_cpp(x)$ans - plogis(x))
#>      [,1] [,2] [,3]
#> [1,]    0    0    0
#> [2,]    0    0    0
#> [3,]    0    0    0

Created on 2023-04-27 with reprex v2.0.2

gvegayon commented 1 year ago

Example: Using system.file() to access files in the inst folder of the package

# Getting the full path of the myfile.slurm file, which is
# included in the R package under:
#   inst/slurm_scripts/myfile.slurm
fn <- system.file(
  file.path("slurm_scripts", "myfile.slurm"),
  package = "epiworldR"
  )

# fn now has the FULL PATH to the myfile.slurm file.
fn
#> [1] "/home/george/R/x86_64-pc-linux-gnu-library/4.2/epiworldR/slurm_scripts/myfile.slurm"

# And thus, we can read it!
readLines(fn)
#> [1] "#!/bin/sh"                     "#SBATCH --job-name=sapply"    
#> [3] "#SBATCH --time=00:10:00"       ""                             
#> [5] "module load R/4.2.2"           "Rscript --vanilla 01-sapply.R"

Created on 2023-04-27 with reprex v2.0.2

gvegayon commented 1 year ago

Example: Random number generation using OpenMP

The problem with OpenMP is that we can't generate random numbers as easily as we would do using R. Since R is not multi-threaded, we need to use other resources. In this example, we generate n uniformly distributed random numbers using OpenMP.

#include <Rcpp.h>
#include <random>

#ifdef _OPENMP
#include <omp.h>
#endif

using namespace Rcpp;

// [[Rcpp::plugins(openmp)]]

// [[Rcpp::export]]
std::vector< std::vector<double> > parallel_rand(
    int n,
    int ncores,
    int seed
) {

#ifdef _OPENMP
  omp_set_num_threads(ncores);
#endif

  // Setting the parallel engine -----------------------------------------------
  std::vector< std::mt19937 > engines;
  std::uniform_real_distribution<> dunif(0.0, 1.0);

  engines.push_back(std::mt19937());
  engines[0].seed(seed);

  // We use the #ifdef _OPENMP to check whether OMP is supported
#ifdef _OPENMP
  int effective_ncores = ncores;
#else
  int effective_ncores = 1;
#endif

  // Setting one seed per engine (based on the original)
  for (int i = 1; i < effective_ncores; ++i) {

    engines.push_back(std::mt19937());
    size_t seed_i =
      static_cast<size_t>(
        std::floor(
          static_cast<double>(std::numeric_limits<size_t>::max()) *
            dunif(engines[0])
        )
      );

    engines[0+1].seed(seed_i);

  }

  // Figuring out how many replicates ------------------------------------------
  std::vector< size_t > nreplicates(effective_ncores, 0);
  std::vector< size_t > nreplicates_csum(effective_ncores, 0);
  size_t sums = 0u;
  for (int i = 0; i < effective_ncores; ++i)
  {
    nreplicates[i] = static_cast<size_t>(
      std::floor(n/effective_ncores)
    );

    // This takes the cumsum
    nreplicates_csum[i] = sums;

    sums += nreplicates[i];

  }

  if (sums < n)
    nreplicates[effective_ncores - 1] += (n - sums);

  // Making room ---------------------------------------------------------------
  std::vector< std::vector< double > > res(effective_ncores);
  for (int i = 0; i < effective_ncores; ++i)
    res[i].resize(nreplicates[i]);

  // Running the experiment ----------------------------------------------------
#pragma omp parallel shared(res, engines, nreplicates) firstprivate(dunif) default(none)
  {

    // Getting the thread number
    auto iam = omp_get_thread_num();

    for (int i = 0; i < nreplicates[iam]; i++)
      res[iam][i] = dunif(engines[iam]);

  }

  return res;

}

/***R

ans <- parallel_rand(1000, 4, 22)

res <- microbenchmark::microbenchmark(
  parallel_rand(100000, 4, 22),
  parallel_rand(100000, 2, 22),
  parallel_rand(100000, 1, 22)
)

print(res, unit = "relative")

*/

This is the expected outcome:

> print(res, unit = "relative")
Unit: relative
                        expr      min       lq     mean   median       uq       max neval cld
 parallel_rand(1e+05, 4, 22) 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000   100  a 
 parallel_rand(1e+05, 2, 22) 1.164803 1.092562 1.127475 1.131114 1.226660 0.6121462   100  a 
 parallel_rand(1e+05, 1, 22) 2.158915 1.977000 2.185225 2.145995 2.454765 1.1397473   100   b

cc @KPDuBose @UofUEpiBio/phs7045-2023

SinghRavin commented 1 year ago

@gvegayon @chipmanj Is this the issue where we post our final R-package?

gvegayon commented 1 year ago

Yes

George G. Vega Yon, Ph.D. (from mobile) https://ggvy.cl Pronouns: he/him/él

On Fri, Apr 28, 2023, 13:54 SinghRavin @.***> wrote:

@gvegayon https://github.com/gvegayon @chipmanj https://github.com/chipmanj Is this the issue where we post our final R-package?

— Reply to this email directly, view it on GitHub https://github.com/UofUEpiBio/PHS7045-advanced-programming/issues/20#issuecomment-1528020541, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG2FM7SUFWJWH37JTWJQPDXDQN73ANCNFSM6AAAAAAXBNIQMA . You are receiving this because you were mentioned.Message ID: @.***>

gvegayon commented 1 year ago

Hey, @UofUEpiBio/phs7045-2023, if you are having trouble with your C++ code, I've written a little bit about how to debug your Rcpp code using Valgrind and GDB: https://book-hpc.ggvy.cl/rcpp-debugging.html

rberch commented 1 year ago

Hi @gvegayon, I added a function to my package that uses the c++ function I created. However, when I try to check and build the package, I get this error;

── R CMD build ─────────────────────────────────────────── ✔ checking for file 'C:\Users\u1302088\OneDrive - University of Utah\SCHOOL WORK\PHS CLASSES\SPRING 2023\R Class\VIFpkg/DESCRIPTION' ... Warning in file.copy(pkgname, Tdir, recursive = TRUE, copy.date = TRUE) : problem copying .\VIFpkg.Rproj.user\shared\notebooks\06F85F2D-README\1\s to C:\Users\u1302088\AppData\Local\Temp\RtmpiCaXg4\Rbuild31a85c4c690\VIFpkg.Rproj.user\shared\notebooks\06F85F2D-README\1\s: Permission denied ERROR copying to build directory failed Error in (function (command = NULL, args = character(), error_on_status = TRUE, …: ! System command 'Rcmd.exe' failed

Exit status: 1 stdout & stderr:

Backtrace:

  1. devtools::check(, check_dir = dirname(getwd()))
  2. withr::with_envvar(pkgbuild::compiler_flags(FALSE), …
  3. base::force(code)
  4. pkgbuild::build(pkg$path, tempdir(), args = build_ar…
  5. withr::with_temp_libpaths(rcmd_build_tools(options$c…
  6. base::force(code)
  7. pkgbuild::rcmd_build_tools(options$cmd, c(options$path, option…
  8. pkgbuild::with_build_tools({ …
  9. withr::with_path(rtools_path(), code)
    1. base::force(code)
    2. base::withCallingHandlers(callr::rcmd_safe(..., env = env,…
    3. callr::rcmd_safe(..., env = env, spinner = FALSE, sh…
    4. callr:::run_r(options)
    5. base::with(options, with_envvar(env, do.call(processx::run…
    6. base::with.default(options, with_envvar(env, do.call(proce…
    7. base::eval(substitute(expr), data, enclos = parent.frame())
    8. base::eval(substitute(expr), data, enclos = parent.frame())
    9. callr:::with_envvar(env, do.call(processx::run, c(list(bin, …
    10. base::force(code)
    11. base::do.call(processx::run, c(list(bin, args = real_cmdar…
    12. (function (command = NULL, args = character(), error…
    13. base::throw(new_process_error(res, call = sys.call(), echo…
    14. | base::signalCondition(cond)
    15. (function (e) …
    16. asNamespace("callr")$err$throw(e) Execution halted

Exited with status 1.

Could you help me identify the problem and how to resolve it?

Thanks! Ransmond

gvegayon commented 1 year ago

What's the link to your repo? Are you getting the same error with GitHub Actions?

George G. Vega Yon, Ph.D. +1 (626) 381 8171 https://ggvy.cl Pronouns https://www.aclunc.org/article/frequently-asked-questions-whats-pronoun: he/him/él

On Tue, May 2, 2023 at 10:13 AM rberch @.***> wrote:

Hi @gvegayon https://github.com/gvegayon, I added a function to my package that uses the c++ function I created. However, when I try to check and build the package, I get this error; ── R CMD build ─────────────────────────────────────────── ✔ checking for file 'C:\Users\u1302088\OneDrive - University of Utah\SCHOOL WORK\PHS CLASSES\SPRING 2023\R Class\VIFpkg/DESCRIPTION' ... Warning in file.copy(pkgname, Tdir, recursive = TRUE, copy.date = TRUE) : problem copying .\VIFpkg.Rproj.user\shared\notebooks\06F85F2D-README\1\s to C:\Users\u1302088\AppData\Local\Temp\RtmpiCaXg4\Rbuild31a85c4c690\VIFpkg.Rproj.user\shared\notebooks\06F85F2D-README\1\s: Permission denied ERROR copying to build directory failed Error in (function (command = NULL, args = character(), error_on_status = TRUE, …: ! System command 'Rcmd.exe' failed Exit status: 1 stdout & stderr:

Backtrace:

  1. devtools::check(, check_dir = dirname(getwd()))
  2. withr::with_envvar(pkgbuild::compiler_flags(FALSE), …
  3. base::force(code)
  4. pkgbuild::build(pkg$path, tempdir(), args = build_ar…
  5. withr::with_temp_libpaths(rcmd_build_tools(options$c…
  6. base::force(code)
  7. pkgbuild::rcmd_build_tools(options$cmd, c(options$path, option…
  8. pkgbuild::with_build_tools({ …
  9. withr::with_path(rtools_path(), code)
  10. base::force(code)
  11. base::withCallingHandlers(callr::rcmd_safe(..., env = env,…
  12. callr::rcmd_safe(..., env = env, spinner = FALSE, sh…
  13. callr:::run_r(options)
  14. base::with(options, with_envvar(env, do.call(processx::run…
  15. base::with.default(options, with_envvar(env, do.call(proce…
  16. base::eval(substitute(expr), data, enclos = parent.frame())
  17. base::eval(substitute(expr), data, enclos = parent.frame())
  18. callr:::with_envvar(env, do.call(processx::run, c(list(bin, …
  19. base::force(code)
  20. base::do.call(processx::run, c(list(bin, args = real_cmdar…
  21. (function (command = NULL, args = character(), error…
  22. base::throw(new_process_error(res, call = sys.call(), echo…
  23. | base::signalCondition(cond)
  24. (function (e) …
  25. asNamespace("callr")$err$throw(e) Execution halted

Exited with status 1.

Could you help me identify the problem and how to resolve it?

Thanks! Ransmond

— Reply to this email directly, view it on GitHub https://github.com/UofUEpiBio/PHS7045-advanced-programming/issues/20#issuecomment-1531748988, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG2FM4LS2LW3HAJMWT4ZWDXEEXDFANCNFSM6AAAAAAXBNIQMA . You are receiving this because you were mentioned.Message ID: @.***>