NIEHS / beethoven

BEETHOVEN is: Building an Extensible, rEproducible, Test-driven, Harmonized, Open-source, Versioned, ENsemble model for air quality
https://niehs.github.io/beethoven/
Other
4 stars 0 forks source link

list of pipeline breaks and needs #341

Open kyle-messier opened 3 months ago

kyle-messier commented 3 months ago

1) tar_resources

sigmafelix commented 3 months ago

@kyle-messier

342 handles many of the items:

  1. An update at set_args_calc() sets the number of threads for relevant targets. For memory settings, we can consider adding more arguments to set_args_calc() for that.
  2. _targets.yaml is integrated into _targets.R (with store argument in tar_config_set()) and README includes a brief description on this
  3. I believe this is what set_args_calc() does. I think I will add set_args_download() and set_args_summary() and others if necessary.
  4. Command line initiation is possible. I added it to README
  5. Both would work, but users will often install the package in devtools way because they are supposed to clone the git repository.
sigmafelix commented 3 months ago

The largest object in _targets is currently around 1GB, which is well below the file size limit in Git LFS for GitHub Enterprise Cloud (5GB) (https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage). After we decide whether we move on to gittargets, I will contact OSC to set git-lfs up on the HPC.

sigmafelix commented 2 months ago

I want to note that a debug session at interactive nodes will create tens of R/node processes once and these processes appear not to close automatically when I lost connection to the interactive session for some reasons (e.g., pulling the ethernet cable, disconnecting from VPN, etc.). I suspect that this behavior happens to ban myself from connecting to SSH to these nodes due to SSH/system settings. I will keep track of this issue.

sigmafelix commented 2 months ago

I talked to Frank and heard that the login node stops accepting new connections when a heavy workload is running in it. The restoration is manually managed by administrators unless the system automatically terminates the problematic workload. It could happen time to time, so everyone needs to be aware of this for figuring out.