Open hcorson-dosch-usgs opened 2 years ago
For the most recent GCM projections work Jordan was using version 3.2.0a3
. However, there are no tags or releases on the AquaticEcoDynamics github beyond 3.1. When Jordan is back from leave we'll need to confirm with him how to access that version of GLM.
@jread-usgs - pinging you here now that you're back. Our current shifter image that @jesse-ross created is using the latest AquaticEcoDynamics release. Did you want us to use a more recent version, and if so, how do we access it?
Ok so looks like the latest there is version 3.2.0a6
. I believe you tried that version for the initial GCM projections work, but were running into issues, so reverted to 3.2.0a3
. Which version would you like us to use for the MN set of projections?
given what we know now, 3.2.0a3
is best to use. But it would be great if our container recipe gave us the flexibility to move to a different version (or commit/tag) for GLM in the future. Unless #15 turns up an issue with this version not recognizing the param...
I should note that for #15 I'm running GLM locally, and using version 3.1.0a4
Good to know. The disable_evap
param has been exposed since v3.0, so it should be working the same(?) for any version at or above that. But this is a funky result and I can add some more thoughts on the other issue specific to the evap question.
whoops wrong issue
Rebuilding with 3.2.0a3
is easy to do in theory but I'm running into some snags. That version won't compile because it's trying to use a function from AquaticEcoDynamics/libplot which doesn't yet exist in the version I'm pinned to. The thing is, I don't know why I'm pinned there. I'm blindly following Alison's build script which is based on something of yours, @jread-usgs . So I stopped pinning to the old libplot
and built it from the most current sources. Does this seem OK?
In any case, the test script doesn't work with v3.2.0a3, but not because of libplot
:
> run_glm(sim_folder)
Cannot open default display
Unknown flag --no-gui
-------------------------------------------------------
| General Lake Model (GLM) Version 3.2.0a3 |
-------------------------------------------------------
The --no-gui
flag to the glm
command is not present in 3.2.0a3. I'm not sure whether this is important, or just a problem with the test script, which is running a function from GLEON/GLM3r
. In any case, I've pushed this to docker hub as jrossusgs/glm3r:v0.6_GLM_3.2.0a3
so it can be pulled into shifter on denali with shifterimg
if you want to give it a try @hcorson-dosch ?
That no-gui
flag is present again in 3.2.0a8, the most recent version, and the test script runs at this version. So I built an image off of that version too, jrossusgs/glm3r:v0.6_GLM_3.2.0a8
.
Let me know if you want 3.2.0a6, happy to build that too, should only take a sec. The --no-gui
flag which was missing from 3.2.0a3 is back by then, so the test from GLEON/GLM3r
should probably work there, but I didn't build it because you had been having trouble with it.
I'll soon be committing the container recipe to this repository along with some instructions on doing the build. It's pretty easy. I am holding off on doing that for a few days, because I think we may be on the verge of a breakthrough in how we can manage containers and I may want to change the process a bit. But I'm happy to push instructions up as-is if it would be useful, i.e. if you want to build the container yourselves and don't mind the risk that the process will change.
Thanks Jesse - my build script is similar to that, but I wasn't pinning the version of libplot. I agree that building from current sources is probably our best option at this point since there isn't a pattern of tagging in those repos.
Great - thank you for all this work, Jesse. I'll test out those 3.2.0a3
and 3.2.0a8
shifter images on Denali with our workflow.
I think it's fine to hold off on committing the container recipe for a few days while you're reviewing best practices for managing our containers.
Alright - so far, the model runs on Denali are failing with the 3.2.0a3
shifter image (Jordan, the glm_code is 0, so the error function of the TryCatch is being triggered by the max_output_date
parameter, which is returning as NA which means it can't be extracted), but running fine with the 3.2.0a8
image. I'll try to dig more into why the 3.2.0a3
runs are failing. One note - I did have to bring over my temporary fix to the rain/snow units in order to get the 3.2.0a8
runs to succeed, which wasn't the case with version 3.1
.
Okay if I remove the line to delete the simulation directories and then manually try to run a model for one of the simulation directories, I get this error (looks like what Jesse was getting):
> GLM3r::run_glm('2_run/tmp/nhdhr_77358110_MRI_2080_2099', verbose = TRUE)
Cannot open default display
Unknown flag --no-gui
-------------------------------------------------------
| General Lake Model (GLM) Version 3.2.0a3 |
-------------------------------------------------------
glm built using gcc version 9.3.0
--help : show this blurb
--nml <nmlfile> : get parameters from nmlfile
--xdisp : display temp/salt and selected others in x-window
--xdisp <plotsfile> : like --xdisp, but use <plotsfile> instead of plots.nml
--saveall : save plots to png files
--save-all-in-one : save all plots to png file
--save-all-in-one <destfile> : save all plots to png file <destfile>
--quiet : less messages
--quiet <level> : set quiet level (1-10)
[1] 0
Warning message:
In glm.systemcall(sim_folder, glm_path, verbose, system.args) :
Custom path to GLM executable set via 'GLM_PATH' environment variable as: /usr/local/bin/GLM/glm
I think the model can still run without that flag, but I could be wrong. The PATH warning though makes me wonder if you are working off of a version of GLM3r prior to this PR: https://github.com/GLEON/GLM3r/pull/20. If you have pkg version 3.1.18 for GLM3r, then you are current and would include that update.
I think the version of GLM3r
must be = 3.1.18
because I was able to run the command GLM3r::glm_version(as_char = TRUE)
in both the 3.2.0a3
and 3.2.0a8
shifter images. I've been waiting for an allocation to check for >1.5 hours, so will confirm when I get that allocation.
Yes, it's GLM3r 3.1.18 for both of those images (if you have docker installed, this can be tested locally without needing to wait for HPC resources).
jross@IGSARMEWLTJROS:~$ docker run -it jrossusgs/glm3r:v0.6_GLM_3.2.0a3 Rscript -e 'packageVersion("GLM3r")'
[1] ‘3.1.18’
jross@IGSARMEWLTJROS:~$ docker run -it jrossusgs/glm3r:v0.6_GLM_3.2.0a8 Rscript -e 'packageVersion("GLM3r")'
[1] ‘3.1.18’
ahh, yes. I see Jesse's dockerfile is building from the current/canonical GLM3r repo while Alison's was installing from a fork. All good!
Quick update here. @jesse-ross just pushed what is hopefully a fixed version of GLM 3.2.0a3
to docker - it is called jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix
. Thanks, Jesse! I'll give it a try when I next get an allocation on Denali.
For the record, that bugfix build is defined here. For each of the libraries, I used the most recent commit which was prior to the bugfix commits to GLM and libplot. If we start using the container in production then I think we ought to move the container definitions into the main repo, but since we're still testing things it seems OK for it to stay where it is.
I tried pulling the new image to Denali on 1/27, and got an error:
At the time I was also getting an error trying to pull the docker image for 3.2.0a8
, which I had pulled previously:
I tried again today, and still got an error, but was again able to pull the older images:
hcorson-dosch@nid00622:/caldera/projects/usgs/water/iidd/datasci/lake-temp/lake-temperature-process-models> module load shifter
hcorson-dosch@nid00622:/caldera/projects/usgs/water/iidd/datasci/lake-temp/lake-temperature-process-models> shifterimg pull docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix
2022-02-01T11:48:21 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:21 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:22 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:22 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:23 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:25 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:26 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:27 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:27 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:28 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:28 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:29 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:30 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:30 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:32 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:32 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:33 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, stat2022-02-01T11:48:33 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, status: PULLINGerr 28
hcorson-dosch@nid00622:/caldera/projects/usgs/water/iidd/datasci/lake-temp/lake-temperature-process-models> shifterimg pull docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a8
2022-02-01T11:48:59 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a8, status: PUL2022-02-01T11:48:59 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a8, status: READY
Update - was just able to pull the new image (jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix
)!
Hmm I'm confused. If I try to build the p2_glm_uncalibrated_runs
target with 3.2.0a3
, the targets all seem to error (targets
error, not an error caught by our tryCatch()
statements):
hcorson-dosch@nid00393:/caldera/projects/usgs/water/iidd/datasci/lake-temp/lake-temperature-process-models> shifterimg pull docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix
2022-02-07T09:49:41 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfi2022-02-07T09:49:42 Pulling Image: docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix, status: READY
hcorson-dosch@nid00393:/caldera/projects/usgs/water/iidd/datasci/lake-temp/lake-temperature-process-models>
hcorson-dosch@nid00393:/caldera/projects/usgs/water/iidd/datasci/lake-temp/lake-temperature-process-models> shifter --image=docker:jrossusgs/glm3r:v0.6_GLM_3.2.0a3_bugfix /bin/bash
groups: cannot find name for group ID 1004
groups: cannot find name for group ID 1005
groups: cannot find name for group ID 1098
groups: cannot find name for group ID 5126
bash: /opt/cray/pe/modules/3.2.11.4/bin/modulecmd: No such file or directory
I have no name!@nid00393:/caldera/projects/usgs/water/iidd/datasci/lake-temp/lake-temperature-process-models$ R
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(targets)
> GLM3r::glm_version(as_char = TRUE)
Error in GLM3r::glm_version(as_char = TRUE) :
"Version" not found in the expected message from GLM, try `as_char = FALSE`
In addition: Warning messages:
1: In glm.systemcall(sim_folder, glm_path, verbose, system.args) :
Custom path to GLM executable set via 'GLM_PATH' environment variable as: /usr/local/bin/GLM/glm
2: In system2(glm_path, wait = TRUE, stdout = TRUE, stderr = NULL, :
running command ''/usr/local/bin/GLM/glm' --help 2>/dev/null' had status 139
> GLM3r::glm_version()
Segmentation fault
[1] 139
Warning message:
In glm.systemcall(sim_folder, glm_path, verbose, system.args) :
Custom path to GLM executable set via 'GLM_PATH' environment variable as: /usr/local/bin/GLM/glm
> Sys.time()
[1] "2022-02-07 16:22:44 UTC"
>
> tar_make_clustermq(p2_glm_uncalibrated_runs, reporter='summary', workers=79)
queue | skip | start | built | error | warn | cancel | time
1 | 4873 | 0 | 0 | 959 | 959 | 0 | 16:27 03.36 Master: [242.1s 103.5% CPU]; Worker: [avg 14.4% CPU, max 922.5 Mb]
0 | 4873 | 0 | 0 | 959 | 959 | 0 | 16:27 03.69
But then I tested a few failed runs directly, and got mixed results. We aren't seeing that Unknown flag --no-gui
error we were previously, which is good, but odd that both models did run, yet one returned a seg fault error while one returned a successful code 0:
> GLM3r::run_glm('2_run/tmp/simulations/nhdhr_86443989_MIROC5_2080_2099', verbose=TRUE)
Cannot open default display
-------------------------------------------------------
| General Lake Model (GLM) Version 3.2.0a3 |
-------------------------------------------------------
glm built using gcc version 9.3.0
build date 20220127-2249UTC
Reading configuration from glm3.nml
nDays= 150; timestep= 3600.000000 (s)
NOTE: values for crest_elev not provided, assuming max elevation, H[bsn]
Maximum lake depth is 5.000000
Depth where flow will occur over the crest is 5.000000
VolAtCrest= 3046924.55089; MaxVol= 3046924.55089 (m3)
No 'sediment' section, turning off sediment heating
WARNING: Initial profiles problem - expected 0 wd_init_vals entries but got 12
Wall clock start time : Mon Feb 7 16:37:15 2022
Simulation begins...
Running day 2488257, 100.00% of days complete
Wall clock finish time : Mon Feb 7 16:37:19 2022
Wall clock runtime was 4 seconds : 00:00:04 [hh:mm:ss]
Model Run Complete
-------------------------------------------------------
Segmentation fault
[1] 139
Warning message:
In glm.systemcall(sim_folder, glm_path, verbose, system.args) :
Custom path to GLM executable set via 'GLM_PATH' environment variable as: /usr/local/bin/GLM/glm
> GLM3r::run_glm('2_run/tmp/simulations/nhdhr_114336515_ACCESS_2080_2099', verbose=TRUE)
Cannot open default display
-------------------------------------------------------
| General Lake Model (GLM) Version 3.2.0a3 |
-------------------------------------------------------
glm built using gcc version 9.3.0
build date 20220127-2249UTC
Reading configuration from glm3.nml
nDays= 150; timestep= 3600.000000 (s)
NOTE: values for crest_elev not provided, assuming max elevation, H[bsn]
Maximum lake depth is 1.000000
Depth where flow will occur over the crest is 1.000000
VolAtCrest= 20263.25326; MaxVol= 20263.25326 (m3)
No 'sediment' section, turning off sediment heating
WARNING: Initial profiles problem - expected 0 wd_init_vals entries but got 12
Wall clock start time : Mon Feb 7 16:40:57 2022
Simulation begins...
Running day 2488257, 100.00% of days complete
Wall clock finish time : Mon Feb 7 16:40:59 2022
Wall clock runtime was 2 seconds : 00:00:02 [hh:mm:ss]
Model Run Complete
-------------------------------------------------------
[1] 0
Warning message:
In glm.systemcall(sim_folder, glm_path, verbose, system.args) :
Custom path to GLM executable set via 'GLM_PATH' environment variable as: /usr/local/bin/GLM/glm
@hcorson-dosch Just now seeing this - yuck! I wonder if there might be something wrong with the build. I tried to use the versions of all of the dependencies that would have been current at the time that it was committed, but possibly the person who compiled/tested it might have had some older versions.
I am not sure what to try next. Two thoughts come to mind.
3.2.0a8
image. Does it work?3.2.0a3
build you used for the most recent GCM projections? If so, we could get the exact commits you had for the dependencies, and use them.Yes the 3.2.0a8
image does work, which is great. I think (and @jread-usgs correct me if I'm wrong), we were interested in running 3.2.0a3
a) because Jordan used it for the projections work he did last summer and b) Jordan has previously seen issues with the latest dev versions of GLM, so thought it would be worth testing with the older 3.2.0a3
to see if it resolves any of the run failures we're currently getting with 3.2.0a8
I was using this image in a model archive example, and to rebuild it I had to roll back a few commits for libaed-water
, adding this line to the Dockerfile after the clone:
cd libaed-water && git reset --hard df2f372916f79a3d573b54bb1ece551354c97680 && cd .. && \
I see there are a few releases in that repo now, might be best to just clone one of those if this image is going to be used longer term.
At this point I'm guessing many of those libraries and packages will have moved along in various ways. If we want the build to be stable over time, and not just have a working image as an artifact of the time the build worked on 2022-06-15, we would probably want to fully specify the other stuff in addition to libaed-water
, i.e.
apt-get
packages. The base rocker/geospatial:4.1.2
tag was last updated 2022-06-23, which is more recently than our working image was built, so in theory we might have to downgrade some system packages from the base image to match our state on 2022-06-15. At the very least we'd want to pin the packages we're explicitly installing ourselves.AquaticEcoDynamics
installs to a suitable commit or taginstall2.r
and as dependencies for remotes::install_github()
, which could be done with something like
ARG CRAN_DATE=2022-06-15
ARG CRAN_REPO=https://packagemanager.rstudio.com/cran/__linux__/focal/$CRAN_DATE
RUN echo "options(repos = c(CRAN = '${CRAN_REPO}'))" >> "${R_HOME}/etc/Rprofile.site"
GLEON
and USGS-R
packages with remotes::install_github(ref="some_suitable_commit_or_tag")
, This might take a few hours to do, because it's tricky to find the right commits for the AquaticEcoDynamics
libraries, and I'm not sure whether it's worth it or not.
A simpler solution which might offer better stability would be to just build a more recent version altogether. All of the AquaticEcoDynamics
repositories now have a v3.3.0
tag, which suggests a coordinated release at a known-working state. @hcorson-dosch and @lindsayplatt, has the current jrossusgs/glm3r:v0.7.1
image (a.k.a. glm3r_v0.7.1.sif
) been working for you, or is it still buggy? Are there features you want in GLM 3.3.0, or known problems which would keep you from using it?
Once we have a functioning workflow on Denali, we'll want to update the shifter image that David and Alison created for the reservoir modeling to the latest version of GLM (it's now version 3.1), or just make a new image, if that is easiest.