Closed adlzanchetta closed 9 months ago
@adlzanchetta Thank you for trying it out and opening an issue. There are a couple of things to verify.
First of all, clear any local docker image you already have. You can do that with the following commands.
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
awiciroh/ciroh-ngen-image latest 6ed0989db281 35 hours ago 2.02GB
And then, remove any image with latest tag. Replace <<IMAGE ID>>
with the ID in your machine.
$ docker image rm <<IMAGE ID>>
Run the following command, and see if your realization file is correct. You will get the result like below. If that is different than below, please give us your output.
cat /.../NextGen/ngen-data/my_data/config/awi_simplified_realization.json | grep sloth_ice_fraction_xinanjiang
"sloth_ice_fraction_xinanjiang(1,double,1,node)": "0.0",
"ice_fraction_xinanjiang" : "sloth_ice_fraction_xinanjiang",
In step 4, When asked: "Select an option (type a number):".
You can provide option 2 : 2) Run Nextgen using remote docker image
Finally, I like to see your uname output. Please paste your output in the comment.
$ uname -a
Darwin UA-QG4YJKY 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:58 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6020 arm64
If you follow all the steps here, and still have the issue, please let us know. Thank you.
I just followed the steps, and it runs successfully in my machine. ( M2 Mac book)
@benlee0423 Thank you for the instructions. I've followed them.
My output for $ docker image ls
was an empty list:
REPOSITORY TAG IMAGE ID CREATED SIZE
so there was not what to rm
.
The output for my:
cat /.../NextGen/ngen-data/my_data/config/awi_simplified_realization.json | grep sloth_ice_fraction_xinanjiang
was the same as yours:
"sloth_ice_fraction_xinanjiang(1,double,1,node)": "0.0",
"ice_fraction_xinanjiang" : "sloth_ice_fraction_xinanjiang",
When I've selected the option 2) Run Nextgen using remote docker image
I got, after a few minutes of downloading:
2) Run Nextgen using remote docker image
#? 2
pulling container and running the model
latest-x86: Pulling from awiciroh/ciroh-ngen-image
9ac1c2cb2b14: Pull complete
51521a94c9a7: Pull complete
b21871739e13: Pull complete
12641844c716: Pull complete
240591867877: Pull complete
11e0cf462e0a: Pull complete
faa537ece78c: Pull complete
ed7b8da30c7c: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:32e89a4382356de47041cba4aaa0899c9e6b2054e9c4d903742a146e4fd83912
Status: Downloaded newer image for awiciroh/ciroh-ngen-image:latest-x86
docker.io/awiciroh/ciroh-ngen-image:latest-x86
What's Next?
Running NextGen docker container... Mounting local host directory /.../NextGen/ngen-data/my_data to /ngen/ngen/data within the container. Working directory is : /ngen/ngen/data
Found these Catchment files in /ngen/ngen/data/: /ngen/ngen/data/config/catchments.geojson
(...)
And the output was the same that indicated the same failure.
4. The output for ```uname -a```:
Linux adlzanchetta-sd 6.5.0-15-generic #15~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 12 18:54:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
5. One extra info: in step 4 I've also tried to go for the bash and search where NGen was getting 'ice_fraction_xinan' from using the good and old ```grep```:
1) Run NextGen model framework in serial mode 3) Run Bash shell 2) Run NextGen model framework in parallel mode 4) Exit Select an option (type a number): 3 Starting a shell, simply exit to stop the process. bash-5.1# grep -Rn '/ngen/ngen/data/config/' -e 'ice_fraction_xinan'
and got
/ngen/ngen/data/config/awi_simplified_realization.json:27: "sloth_ice_fraction_xinanjiang(1,double,1,node)": "0.0", /ngen/ngen/data/config/awi_simplified_realization.json:49: "ice_fraction_xinanjiang" : "sloth_ice_fraction_xinanjiang", bash-5.1#
So I don't know where the 'ice_fraction_xinan' of the error message is coming from.
This is because latest-x86 image is 2 months old. Would it be possible for you to try on arm machine? That works right now. We are working on updating the x86 image with the latest.
@arpita0911patel Oh, this makes sense. After running the commands, I see that the image I got is almost 2-months old:
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
awiciroh/ciroh-ngen-image latest-x86 37f60ff8da24 7 weeks ago 2.21GB
I unfortunately don't have access to an ARM machine, so probably I should go for the path of compiling NextGen directly.
Thanks for the clarification!
Ah, do you prefer that I close this issue now, or that I leave this issue open to be closed in the future, when the x86 image is available and I will be able to give a better feedback?
@adlzanchetta Thank you for the feedback. We are working on getting x86 image. So, you can leave this open, and close it when the image becomes available.
@adlzanchetta , if you have the access to HPC on your side, then we would like you to try using the Singularity image as that is available. Please let us know if you would like to go try using Singularity on HPC.
@arpita0911patel Sorry the delay on getting back to you. Before answering, I was trying to ensure that I would be able to create my own Docker with all I needed. Thank you for offering, but at first I think I can go ahead with what I have now so the Singularity image may not be needed. However, I have two questions about it (please let me know if this is not the right place for this talk):
Is the Singularity image already publicly available somewhere, or you would need to make it available for me?
Our group created a BMI interface for another existing hydrological model and we are testing how it goes with NextGen. The model is implemented in C and the shared lib with the BMI interface is provided in a .so file. I am not an expert on Docker nor on Singularity, so my first guess is that we need to have our .so shared library compiled inside the Docker/Singularity image to have it consumed by NexGen. I.e., I would face incompatibility issues if I compile my .so file on my local Ubuntu 22.04 desktop and then bring it into a Rocky-based Docker container by mounting a volume where the .so is present (or the equivalent for a Singularity image), right?
@adlzanchetta latest-x86 image is available now. You can follow NGEN instructions in your end now. If it works for you, please close this ticket.
@adlzanchetta Regarding your question, Singularity image is available publicly. Please refer this repo for more details: https://github.com/CIROH-UA/Ngen-Singularity
Regarding your question#2, @hellkite500 could answer that for you.
@benlee0423 I got to run successfully here bot in serial and parallel. Thank you!
@arpita0911patel Thank you for the indication!
Short description explaining the high-level reason for the new issue.
Current behavior
The NGEN run following the instructions in the main
README.md
of the main branch (commit 5e1afa1) results in failure:Expected behavior
An output that indicates success run.
Steps to replicate behavior (include URLs)
In an Ubuntu 22.04 environment with regular laptop (16 GB Ram, AMD quad-core processor)
README.md
:$ mkdir -p NextGen/ngen-dat
$ cd NextGen/ngen-data
$ wget --no-parent https://ciroh-ua-ngen-data.s3.us-east-2.amazonaws.com/AWI-003/AWI_03W_113060_003.tar.gz
$ tar -xf AWI_03W_113060_003.tar.gz
$ mv AWI_03W_113060_003 my_data
$ cd ../..
$ git clone https://github.com/CIROH-UA/NGIAB-CloudInfra.git
$ cd NGIAB-CloudInfra/
$ git checkout main
./guide.sh
Enter your input data directory path (use absolute path):
, Provide:/.../NextGen/ngen-data/my_data
1
("Run NextGen Model using local docker image");1
("Run NextGen model framework in serial mode");/ngen/ngen/data/config/catchments.geojson
;/ngen/ngen/data/config/nexus.geojson
;/ngen/ngen/data/config/awi_simplified_realization.json
;(the run also results in error when trying to run in parallel mode - step 5)
My opinion
I was able to run successfully an up-to-date version of the NGIAB using a set of input files that I've downloaded in 2023-Dec-13.
So maybe the input files indicated in the instructions are outdated? Or there is some inconsistency in the indicated input files?