RMI-PACTA / actions

Actions for GitHub workflows
4 stars 0 forks source link

feature: `docker-check-R-sysdeps.yml` should only check sys deps of dependencies #79

Open jdhoffa opened 4 months ago

jdhoffa commented 4 months ago

confirmed that going with this is probably more ideal (and doesn't have the same problem)

req_r_pkgs <- sort(unique(pak::pkg_deps(pkg = ".")$package))
pak::sysreqs_check_installed(packages = req_r_pkgs)

Originally posted by @cjyetman in https://github.com/RMI-PACTA/workflow.data.preparation/issues/221#issuecomment-2074560917

AlexAxthelm commented 4 months ago

Hmm. Even upon first glance, this looked like a non-trivial problem to solve, but as I'm looking at it more closely, it's getting trickier.

I think a reasonable approach would be to only check sysdeps for packages in IMPORTS (not SUGGESTS) of some vector (possibly length 1) of packages (such as just the workflow.* package in those repos), and on down through the IMPORTS tree. Generating that vector procedurally (possibly with some manual override) seems finicky, but doable.

The harder part here is in determining what sysreqs are actually relevant to include in an image. For example, the DESCRIPTION for arrow (proximal cause for this issue) has:

SystemRequirements: C++17; for AWS S3 support on Linux, libcurl and openssl (optional);
    cmake >= 3.16 (build-time only, and only for full source build)

which means that although there's no actual Requirements (and some are only needed at build-time), they're all getting picked up by pak as requirements.

The best solution would be if R had some way to distinguish between these, but I don't think that's coming any time soon, so maybe a better option would be to have an ignorelist, along the lines of

"sysreqs_ignore": [
  "arrow": ["C++17", "cmake"]
]

But overall, I'm wondering if the better option is to drop this check, and have practical (in situ) testing check for missing dependencies instead. That would be tougher to set up on the front end, but we'd only get true positives. For example, if we're using RPostgres without libpq installed, we'll get errors. The tricky part of testing anything with an acual sysdep, is that a lot of the time, those are there for communication to an external system, and that means we would need to have integration testing (against an actual PSQL instance, in this eaxmple) that would probably need to run in sequence/parallel with the "unit" tests. For example, the unit tests for workflow.factset use an in-memory SQLite, rather than a Postgres instance for unit tests, so we would need to set up a set of checks that actually test against a Postgres instance in addition to the SQLite to feel confident we've captured the necessary sysreqs.

@jdhoffa @cjyetman