Rethink and centralize how we deal with logging, warning and error handling in our packages

jdhoffa commented 1 year ago

Context

Many people use R (and the functions that we design for PACTA) interactively. R is developed with this in mind, and tidyverse facilitates this very well.

However, we also have several Docker images and workflows that combine functions and data in particular ways to produce outputs. Two of the largest are: https://github.com/RMI-PACTA/workflow.data.preparation and https://github.com/RMI-PACTA/workflow.transition.monitor

In the case of workflows, it is often useful to run through many functions and decide how to handle errors from each of them (and where to log them) on a case by case. For the transition monitor, we may want to write the logs to an error_messages.txt file. For data preparation, we may want to write the logs to a manifest.json, or some other log output. (We may also want to align the behaviours of these two, and handle logging in a natural and consistent way for both? Relates to RMI-PACTA/pacta.portfolio.audit#2)

Benefits of separating the error handling from the function

When writing unit tests, we often just want to know "Is the data being processed in the right way?", ie.

Is TDM being calculated correctly?
Is target market share being calculated correctly, and giving informative errors when it doesn't have correct data?
If a function errors, it should be because the input data didn't have the correct names, columns, values, etc.

We don't necessarily want to care about the broader behavior of "should it fail if input data came from a step before that had Equity data but no Bonds data, or how do we handle data prep input that is missing some particular scenario that is crucial to one particular metric.

We want to be able to test the precise functionality of the function in the context of the output it produces in isolation.

How functions get combined downstream, and how they should behave when we are trying to generate a report is a separate (but equally important) concept.

Contrary to that, we have a logging function in this package: https://github.com/RMI-PACTA/pacta.portfolio.analysis/blob/main/R/write_log.R

Which is called in various contexts (and often many times) to log different aspects within different functions: https://github.com/RMI-PACTA/pacta.portfolio.analysis/blob/2d322adf82103cab16cc04ffc5e89421b300a431/R/get_input_files.R#L28-L34

This is a bit of an awkward implementation since it makes it difficult to run anything interactively (you would need to have a local logging path). You also need to decide WHICH messages to log (wouldn't it be easier to just automatically log ALL error and warning messages, with appropriate tags?). We are combining the "WHAT" and the "HOW" and it makes it difficult to test.

Extracting the logging and handling functionality would make it much easier to unit_test the actual purpose of functions in pacta.portfolio.analysis, while leaving error handling and logging to separate functions, that we can (for example) wrap function calls in when running it in a server or docker context. See a toy example of this functionality in the reprex below:

Reprex

# write log message to a file_path
write_log <- function(msg, file_path) {
  composed <- paste(
    as.character(Sys.time()),
    as.character(msg)
  )
  if (!dir.exists(file_path)) {
    dir.create(file_path, recursive = TRUE)
  }
  write(composed, file = file.path(file_path, "error_messages.txt"), append = TRUE)
}

# toy function to demonstrate different error and warning behaviours
foo <- function(bar) {
  # some error condition
  if (!is.numeric(bar)) stop("Argument `bar` must be numeric.")

  # some warning condition
  if (!is.integer(bar)) warning("Argument `bar` should be an integer.")
}

foo_handler <- function(log_dir, ...) {
  tryCatch({
    write_log("Running foo...", log_dir)
    out <- foo(...)
  }, warning = function(war) {
    write_log(war, log_dir)
  }, error = function(err) {
    write_log(err, log_dir)
    stop(err)
  }, finally = {
    write_log("Moving to the next step.", log_dir)
  })
}

# IN THE INTERACTIVE CONTEXT, we would run functions like this:

# passes with no message
foo(as.integer(1))

# passes with a warning
foo(1)
#> Warning in foo(1): Argument `bar` should be an integer.

# fails with an error message
foo("baz")
#> Error in foo("baz"): Argument `bar` must be numeric.

# ON THE SERVER OR IN A DOCKER IMAGE, we would run functions like this:

# create a temp dir for the log file
log_dir <- tempdir()
foo_handler(log_dir, as.integer(1))

readLines(file.path(log_dir, "error_messages.txt"))
#> [1] "2023-04-13 13:07:05 Running foo..."          
#> [2] "2023-04-13 13:07:05 Moving to the next step."
file.remove(file.path(log_dir, "error_messages.txt"))
#> [1] TRUE

foo_handler(log_dir, 1)

readLines(file.path(log_dir, "error_messages.txt"))
#> [1] "2023-04-13 13:07:05 Running foo..."                                                 
#> [2] "2023-04-13 13:07:05 simpleWarning in foo(...): Argument `bar` should be an integer."
#> [3] ""                                                                                   
#> [4] "2023-04-13 13:07:05 Moving to the next step."
file.remove(file.path(log_dir, "error_messages.txt"))
#> [1] TRUE

foo_handler(log_dir, "baz")
#> Error in foo(...): Argument `bar` must be numeric.

readLines(file.path(log_dir, "error_messages.txt"))
#> [1] "2023-04-13 13:07:05 Running foo..."                                    
#> [2] "2023-04-13 13:07:05 Error in foo(...): Argument `bar` must be numeric."
#> [3] ""                                                                      
#> [4] "2023-04-13 13:07:05 Moving to the next step."
file.remove(file.path(log_dir, "error_messages.txt"))
#> [1] TRUE

^{Created on 2023-04-13 with reprex v2.0.2}

FYI. @cjyetman @AlexAxthelm

cjyetman commented 8 months ago

This is a rather meta issue that probably belongs somewhere else. @jdhoffa? related issue for specifically this repo is here RMI-PACTA/workflow.transition.monitor#70

AlexAxthelm commented 8 months ago

See also thread in RMI-PACTA/workflow.transition.monitor#252

AlexAxthelm commented 8 months ago

Relates to: https://github.com/RMI-PACTA/workflow.data.preparation/issues/17

AlexAxthelm commented 8 months ago

Putting my thoughts on the "how" of logging here, since this seems to be the most cohesive thread at the moment.

Standard utilities

First: there should not be any notable difference between running something interactively and running it via a script. The function should be the function, and any behavior changes should be brought about by arguments. This includes the need to make things like the foo_handler() above, which introduces a key split between what is used in dev and PROD.

This means that we need to write our functions to be largely ignorant of their contexts, except what is defined by arguments, or shared namespace (functional programming 🎉 ). Therefore, logging should always go to the one place that we know exists: stdout (or stderr). Any additional logging apparatus should be handled by the logger not by the calling function.

logger does this automatically, so I would consider re-writing the code example you provided with something closer to this:

# add appender so that reprex captures the log output. Also visible in the stdout dropdown below
logger::log_appender(logger::appender_stdout, index = 2)

foo <- function(bar) {
  # some error condition
  if (!is.numeric(bar)) {
    logger::log_error("Argument `bar` must be numeric, not {class(bar)}.")
    stop("Argument `bar` must be numeric.")
  }
  # some warning condition
  if (!is.integer(bar)) {
    logger::log_warn(
      "Argument `bar` is recommended to be integer, not {class(bar)}."
    )
    warning("Argument `bar` should be integer")
  }
  return(bar)
}

foo("bar")
#> ERROR [2024-03-13 17:23:41] Argument `bar` must be numeric, not character.
#> Error in foo("bar"): Argument `bar` must be numeric.
foo(1.5)
#> WARN [2024-03-13 17:23:41] Argument `bar` is recommended to be integer, not numeric.
#> Warning in foo(1.5): Argument `bar` should be integer
#> [1] 1.5
foo(1L)
#> [1] 1

^{Created on 2024-03-13 with reprex v2.0.2}

Standard output and standard error

``` sh ERROR [2024-03-13 17:23:41] Argument `bar` must be numeric, not character. WARN [2024-03-13 17:23:41] Argument `bar` is recommended to be integer, not numeric. ```

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os macOS Sonoma 14.2 #> system aarch64, darwin23.0.0 #> ui unknown #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Belgrade #> date 2024-03-13 #> pandoc 3.1.7 @ /opt/homebrew/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.2) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> logger 0.3.0 2024-03-05 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.1) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> #> [1] /opt/homebrew/lib/R/4.3/site-library #> [2] /opt/homebrew/Cellar/r/4.3.2/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Multiple output streams

Awesome. so that covers logging to stdout, which is ephemeral. Now we have the question of how to persist those logs appropriately. Fortunately, we have some options here:

Capture stdout in some way. this could be as simple as tee, or something more complicated like attaching the environment to an Azure Log Workspace (which we are doing with our Azure containers).
Redirect logs to another venue. This has the advantage that you can differentiate logs between output streams, so you have the option of capturing things that aren't emitted to stdout (which may be useful if you're dealing with sensitive logs). The downside here is that you need to handle this setup in the application flow.

Thankfully again, logger makes this easy! Remember the log_appender() call in the reprex above? That's basically saying "please also write the log message to this destination." So we can consider a variant on the above:

# add appender so that reprex captures the log output
logger::log_appender(logger::appender_stdout, index = 2)

# add file outputs to capture logs
logger::log_appender(logger::appender_file("foo.log"), index = 3)

# and because this is a demo, add one that captures in different level and format
logger::log_appender(logger::appender_file("bar.json"), index = 4)
logger::log_threshold("TRACE", index = 4)
logger::log_layout(logger::layout_json(), index = 4)

foo <- function(bar) {
  # Add a trace log
  logger::log_trace("Entering foo()")
  logger::log_trace("Argument `bar` is {bar}")
  # some error condition
  if (!is.numeric(bar)) {
    logger::log_error("Argument `bar` must be numeric, not {class(bar)}.")
    stop("Argument `bar` must be numeric.")
  }
  # some warning condition
  if (!is.integer(bar)) {
    logger::log_warn(
      "Argument `bar` is recommended to be integer, not {class(bar)}."
    )
    warning("Argument `bar` should be integer")
  }
  return(bar)
}

foo("some string")
#> ERROR [2024-03-13 18:43:49] Argument `bar` must be numeric, not character.
#> Error in foo("some string"): Argument `bar` must be numeric.
foo(1.5)
#> WARN [2024-03-13 18:43:49] Argument `bar` is recommended to be integer, not numeric.
#> Warning in foo(1.5): Argument `bar` should be integer
#> [1] 1.5
foo(1L)
#> [1] 1

readLines("foo.log")
#> [1] "ERROR [2024-03-13 18:43:49] Argument `bar` must be numeric, not character."          
#> [2] "WARN [2024-03-13 18:43:49] Argument `bar` is recommended to be integer, not numeric."
readLines("bar.json")
#> [1] "{\"time\":\"2024-03-13 18:43:49\",\"level\":\"TRACE\",\"ns\":\"global\",\"ans\":\"global\",\"topenv\":\"R_GlobalEnv\",\"fn\":\"foo\",\"node\":\"MACCM424GV711\",\"arch\":\"arm64\",\"os_name\":\"Darwin\",\"os_release\":\"23.2.0\",\"os_version\":\"Darwin Kernel Version 23.2.0: Wed Nov 15 21:59:33 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8112\",\"pid\":8395,\"user\":\"aaxthelm\",\"msg\":\"Entering foo()\"}"                                          
#> [2] "{\"time\":\"2024-03-13 18:43:49\",\"level\":\"TRACE\",\"ns\":\"global\",\"ans\":\"global\",\"topenv\":\"R_GlobalEnv\",\"fn\":\"foo\",\"node\":\"MACCM424GV711\",\"arch\":\"arm64\",\"os_name\":\"Darwin\",\"os_release\":\"23.2.0\",\"os_version\":\"Darwin Kernel Version 23.2.0: Wed Nov 15 21:59:33 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8112\",\"pid\":8395,\"user\":\"aaxthelm\",\"msg\":\"Argument `bar` is some string\"}"                           
#> [3] "{\"time\":\"2024-03-13 18:43:49\",\"level\":\"ERROR\",\"ns\":\"global\",\"ans\":\"global\",\"topenv\":\"R_GlobalEnv\",\"fn\":\"foo\",\"node\":\"MACCM424GV711\",\"arch\":\"arm64\",\"os_name\":\"Darwin\",\"os_release\":\"23.2.0\",\"os_version\":\"Darwin Kernel Version 23.2.0: Wed Nov 15 21:59:33 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8112\",\"pid\":8395,\"user\":\"aaxthelm\",\"msg\":\"Argument `bar` must be numeric, not character.\"}"          
#> [4] "{\"time\":\"2024-03-13 18:43:49\",\"level\":\"TRACE\",\"ns\":\"global\",\"ans\":\"global\",\"topenv\":\"R_GlobalEnv\",\"fn\":\"foo\",\"node\":\"MACCM424GV711\",\"arch\":\"arm64\",\"os_name\":\"Darwin\",\"os_release\":\"23.2.0\",\"os_version\":\"Darwin Kernel Version 23.2.0: Wed Nov 15 21:59:33 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8112\",\"pid\":8395,\"user\":\"aaxthelm\",\"msg\":\"Entering foo()\"}"                                          
#> [5] "{\"time\":\"2024-03-13 18:43:49\",\"level\":\"TRACE\",\"ns\":\"global\",\"ans\":\"global\",\"topenv\":\"R_GlobalEnv\",\"fn\":\"foo\",\"node\":\"MACCM424GV711\",\"arch\":\"arm64\",\"os_name\":\"Darwin\",\"os_release\":\"23.2.0\",\"os_version\":\"Darwin Kernel Version 23.2.0: Wed Nov 15 21:59:33 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8112\",\"pid\":8395,\"user\":\"aaxthelm\",\"msg\":\"Argument `bar` is 1.5\"}"                                   
#> [6] "{\"time\":\"2024-03-13 18:43:49\",\"level\":\"WARN\",\"ns\":\"global\",\"ans\":\"global\",\"topenv\":\"R_GlobalEnv\",\"fn\":\"foo\",\"node\":\"MACCM424GV711\",\"arch\":\"arm64\",\"os_name\":\"Darwin\",\"os_release\":\"23.2.0\",\"os_version\":\"Darwin Kernel Version 23.2.0: Wed Nov 15 21:59:33 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8112\",\"pid\":8395,\"user\":\"aaxthelm\",\"msg\":\"Argument `bar` is recommended to be integer, not numeric.\"}"
#> [7] "{\"time\":\"2024-03-13 18:43:49\",\"level\":\"TRACE\",\"ns\":\"global\",\"ans\":\"global\",\"topenv\":\"R_GlobalEnv\",\"fn\":\"foo\",\"node\":\"MACCM424GV711\",\"arch\":\"arm64\",\"os_name\":\"Darwin\",\"os_release\":\"23.2.0\",\"os_version\":\"Darwin Kernel Version 23.2.0: Wed Nov 15 21:59:33 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8112\",\"pid\":8395,\"user\":\"aaxthelm\",\"msg\":\"Entering foo()\"}"                                          
#> [8] "{\"time\":\"2024-03-13 18:43:49\",\"level\":\"TRACE\",\"ns\":\"global\",\"ans\":\"global\",\"topenv\":\"R_GlobalEnv\",\"fn\":\"foo\",\"node\":\"MACCM424GV711\",\"arch\":\"arm64\",\"os_name\":\"Darwin\",\"os_release\":\"23.2.0\",\"os_version\":\"Darwin Kernel Version 23.2.0: Wed Nov 15 21:59:33 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8112\",\"pid\":8395,\"user\":\"aaxthelm\",\"msg\":\"Argument `bar` is 1\"}"

^{Created on 2024-03-13 with reprex v2.0.2}

Standard output and standard error

``` sh ERROR [2024-03-13 18:43:49] Argument `bar` must be numeric, not character. WARN [2024-03-13 18:43:49] Argument `bar` is recommended to be integer, not numeric. ```

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os macOS Sonoma 14.2 #> system aarch64, darwin23.0.0 #> ui unknown #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Belgrade #> date 2024-03-13 #> pandoc 3.1.7 @ /opt/homebrew/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.2) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> logger 0.3.0 2024-03-05 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.1) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> #> [1] /opt/homebrew/lib/R/4.3/site-library #> [2] /opt/homebrew/Cellar/r/4.3.2/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Note here that we're not capturing the messages from stop and warning, which is why I've been an advocate for using both logger::log_warn and warning together, or log_error and stop (use logger for messages, use warning/stop to control application flow).

But in any case, this gives us what I think we're looking for: a function implemented once, and the logging behavior controlled external to the function, but consistent. By default, if a user runs this function interactively, they'll get the useful messages on stdout. If we're planning on putting this into a script, I would consider doing something like the following, which would include the log messages along with the outputs

# run_pacta.R

output_dir <- sys.getEnv("OUTPUT_DIR")

# include logs along with outputs
logger::log_appender(logger::appender_file(file.path(output_dir, "log.txt"), index = 2)

#write traces to separate dir, so they can be removed/cleaned regularly
logger::log_appender(logger::appender_file(file.path(traces_dir, "log.txt"), index = 3)
logger::log_threshold("TRACE", index = 3)

main(write_to = output_dir)

jdhoffa commented 8 months ago

@AlexAxthelm this sounds very cool!!

NIT: In general, it sounds like @cjyetman doesn't want this discussion to happen on this particular repo that he maintains, so maybe we move this discussion to https://github.com/RMI-PACTA/workflow.pacta ?

AlexAxthelm commented 8 months ago

moved.

AlexAxthelm commented 5 months ago

@cjyetman @jdhoffa picking up this thread again, I think what we have in pacta.workflow.utils is a pretty good approximation for what I'd hope for:

Basically, we import log_* into the called packages, and write some log_info() statements from the top-level functions there, and lower level ones get log_debug and log_trace as appropriate.

I think that adding some log_* calls in pacta.portfolio.import, pacta.portfolio.audit and pacta.portfolio.allocate would be pretty useful overall.

AlexAxthelm commented 3 months ago

@jdhoffa @cjyetman vote to close this issue, unless we think we need more discussion?

jdhoffa commented 3 months ago

Happy to close it!

RMI-PACTA / workflow.pacta