CIROH-UA / NGIAB-CloudInfra

NextGen In A Box: NextGen Generation Water Modeling Framework for Community Release (Docker version)
https://docs.ciroh.org/docs/products/Community%20Hydrologic%20Modeling%20Framework/nextgeninaboxDocker/
12 stars 19 forks source link

Following steps in the README.md does file not result in successful run #95

Closed adlzanchetta closed 9 months ago

adlzanchetta commented 9 months ago

Short description explaining the high-level reason for the new issue.

Current behavior

The NGEN run following the instructions in the main README.md of the main branch (commit 5e1afa1) results in failure:

NGen Framework 0.1.0
Building Nexus collection
Building Catchment collection
Config file details - Line Count: 27 | Max Line Length 46
Config Value - Param: 'forcing_file' | Value: 'BMI' | Units: '(null)'
Config Value - Param: 'surface_partitioning_scheme' | Value: 'Schaake' | Units: '(null)'
Config Value - Param: 'soil_params.depth' | Value: '2.0' | Units: 'm'
Config Value - Param: 'soil_params.b' | Value: '8.93396282196045' | Units: ''
Config Value - Param: 'soil_params.satdk' | Value: '3.19069084890877e-05' | Units: 'm s-1'
Config Value - Param: 'soil_params.satpsi' | Value: '3.98730560956446' | Units: 'm'
Config Value - Param: 'soil_params.slop' | Value: '0.057029859113015' | Units: 'm/m'
Config Value - Param: 'soil_params.smcmax' | Value: '0.401686143900526' | Units: 'm/m'
Config Value - Param: 'soil_params.wltsmc' | Value: '0.048334490431746' | Units: 'm/m'
Config Value - Param: 'soil_params.expon' | Value: '1.0' | Units: ''
Config Value - Param: 'soil_params.expon_secondary' | Value: '1.0' | Units: ''
Config Value - Param: 'refkdt' | Value: '3.72730851635058' | Units: '(null)'
Config Value - Param: 'max_gw_storage' | Value: '0.016' | Units: 'm'
Config Value - Param: 'Cgw' | Value: '0.0018' | Units: 'm h-1'
Config Value - Param: 'expon' | Value: '6.0' | Units: ''
Config Value - Param: 'gw_storage' | Value: '0.05' | Units: 'm/m'
Config Value - Param: 'alpha_fc' | Value: '0.33' | Units: '(null)'
Config Value - Param: 'soil_storage' | Value: '0.05' | Units: 'm/m'
Config Value - Param: 'K_nash' | Value: '0.03' | Units: ''
Config Value - Param: 'K_lf' | Value: '0.01' | Units: ''
Config Value - Param: 'nash_storage' | Value: '0.0,0.0' | Units: '(null)'
Config Value - Param: 'num_timesteps' | Value: '1' | Units: '(null)'
Config Value - Param: 'verbosity' | Value: '1' | Units: '(null)'
Config Value - Param: 'DEBUG' | Value: '0' | Units: '(null)'
Config Value - Param: 'giuh_ordinates' | Value: '1.00,0.00' | Units: '(null)'
Found configured GIUH ordinate values ('1.00,0.00')
Config Value - Param: '' | Value: '(null)' | Units: '(null)'
Config Value - Param: '' | Value: '(null)' | Units: '(null)'
Schaake Magic Constant calculated
All CFE config params present
GIUH ordinates string value found in config ('1.00,0.00')
Counted number of GIUH ordinates (2)
Finished function parsing CFE config
At declaration of smc_profile size, soil_reservoir.n_soil_layers = 0
terminate called after throwing an instance of 'realization::ConfigurationException'
  what():  Multi BMI formulation cannot be created from config: cannot find available data provider to satisfy set of deferred provisions for nested module at index 1: {ice_fraction_xinan}
./HelloNGEN.sh: line 76:    13 Aborted                 (core dumped) /dmod/bin/ngen-serial $n1 all $n2 all $n3

Expected behavior

An output that indicates success run.

Steps to replicate behavior (include URLs)

In an Ubuntu 22.04 environment with regular laptop (16 GB Ram, AMD quad-core processor)

  1. Run the set-up commands as described in the README.md: $ mkdir -p NextGen/ngen-dat $ cd NextGen/ngen-data $ wget --no-parent https://ciroh-ua-ngen-data.s3.us-east-2.amazonaws.com/AWI-003/AWI_03W_113060_003.tar.gz $ tar -xf AWI_03W_113060_003.tar.gz $ mv AWI_03W_113060_003 my_data $ cd ../.. $ git clone https://github.com/CIROH-UA/NGIAB-CloudInfra.git $ cd NGIAB-CloudInfra/ $ git checkout main
  2. Run ./guide.sh
  3. When asked: Enter your input data directory path (use absolute path):, Provide: /.../NextGen/ngen-data/my_data
  4. When asked: "Select an option (type a number):" Provide: 1 ("Run NextGen Model using local docker image");
  5. When asked again: "Select an option (type a number):" Provide: 1 ("Run NextGen model framework in serial mode");
  6. When asked to enter the input files (catchment, nexus, realization) Provide, respectively: /ngen/ngen/data/config/catchments.geojson; /ngen/ngen/data/config/nexus.geojson; /ngen/ngen/data/config/awi_simplified_realization.json;
  7. The output described in "Current behavior" is given.

(the run also results in error when trying to run in parallel mode - step 5)

My opinion

I was able to run successfully an up-to-date version of the NGIAB using a set of input files that I've downloaded in 2023-Dec-13.

So maybe the input files indicated in the instructions are outdated? Or there is some inconsistency in the indicated input files?

benlee0423 commented 9 months ago

@adlzanchetta Thank you for trying it out and opening an issue. There are a couple of things to verify.

  1. First of all, clear any local docker image you already have. You can do that with the following commands.

    $ docker image ls
    REPOSITORY                  TAG                 IMAGE ID       CREATED         SIZE
    awiciroh/ciroh-ngen-image   latest              6ed0989db281   35 hours ago    2.02GB

    And then, remove any image with latest tag. Replace <<IMAGE ID>> with the ID in your machine.

    $ docker image rm <<IMAGE ID>>
  2. Run the following command, and see if your realization file is correct. You will get the result like below. If that is different than below, please give us your output.

    cat /.../NextGen/ngen-data/my_data/config/awi_simplified_realization.json | grep sloth_ice_fraction_xinanjiang
                                    "sloth_ice_fraction_xinanjiang(1,double,1,node)": "0.0",
                                    "ice_fraction_xinanjiang" : "sloth_ice_fraction_xinanjiang",
  3. In step 4, When asked: "Select an option (type a number):". You can provide option 2 : 2) Run Nextgen using remote docker image

  4. Finally, I like to see your uname output. Please paste your output in the comment.

    $ uname -a
    Darwin UA-QG4YJKY 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:58 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6020 arm64

    If you follow all the steps here, and still have the issue, please let us know. Thank you.

benlee0423 commented 9 months ago

I just followed the steps, and it runs successfully in my machine. ( M2 Mac book)

adlzanchetta commented 9 months ago

@benlee0423 Thank you for the instructions. I've followed them.

  1. My output for $ docker image ls was an empty list:

    REPOSITORY   TAG       IMAGE ID   CREATED   SIZE

    so there was not what to rm.

  2. The output for my: cat /.../NextGen/ngen-data/my_data/config/awi_simplified_realization.json | grep sloth_ice_fraction_xinanjiang was the same as yours:

    "sloth_ice_fraction_xinanjiang(1,double,1,node)": "0.0",
    "ice_fraction_xinanjiang" : "sloth_ice_fraction_xinanjiang",
  3. When I've selected the option 2) Run Nextgen using remote docker image I got, after a few minutes of downloading:

    
    2) Run Nextgen using remote docker image
    #? 2
    pulling container and running the model
    latest-x86: Pulling from awiciroh/ciroh-ngen-image
    9ac1c2cb2b14: Pull complete 
    51521a94c9a7: Pull complete 
    b21871739e13: Pull complete 
    12641844c716: Pull complete 
    240591867877: Pull complete 
    11e0cf462e0a: Pull complete 
    faa537ece78c: Pull complete 
    ed7b8da30c7c: Pull complete 
    4f4fb700ef54: Pull complete 
    Digest: sha256:32e89a4382356de47041cba4aaa0899c9e6b2054e9c4d903742a146e4fd83912
    Status: Downloaded newer image for awiciroh/ciroh-ngen-image:latest-x86
    docker.io/awiciroh/ciroh-ngen-image:latest-x86

What's Next?

  1. Sign in to your Docker account → docker login
  2. View a summary of image vulnerabilities and recommendations → docker scout quickview awiciroh/ciroh-ngen-image:latest-x86

Running NextGen docker container... Mounting local host directory /.../NextGen/ngen-data/my_data to /ngen/ngen/data within the container. Working directory is : /ngen/ngen/data

Found these Catchment files in /ngen/ngen/data/: /ngen/ngen/data/config/catchments.geojson

(...)


And the output was the same that indicated the same failure.

4. The output for ```uname -a```:

Linux adlzanchetta-sd 6.5.0-15-generic #15~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 12 18:54:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux


5. One extra info: in step 4 I've also tried to go for the bash and search where NGen was getting 'ice_fraction_xinan' from using the good and old ```grep```:

1) Run NextGen model framework in serial mode 3) Run Bash shell 2) Run NextGen model framework in parallel mode 4) Exit Select an option (type a number): 3 Starting a shell, simply exit to stop the process. bash-5.1# grep -Rn '/ngen/ngen/data/config/' -e 'ice_fraction_xinan'

and got

/ngen/ngen/data/config/awi_simplified_realization.json:27: "sloth_ice_fraction_xinanjiang(1,double,1,node)": "0.0", /ngen/ngen/data/config/awi_simplified_realization.json:49: "ice_fraction_xinanjiang" : "sloth_ice_fraction_xinanjiang", bash-5.1#



So I don't know where the 'ice_fraction_xinan' of the error message is coming from.
arpita0911patel commented 9 months ago

This is because latest-x86 image is 2 months old. Would it be possible for you to try on arm machine? That works right now. We are working on updating the x86 image with the latest.

adlzanchetta commented 9 months ago

@arpita0911patel Oh, this makes sense. After running the commands, I see that the image I got is almost 2-months old:

$ docker image ls
REPOSITORY                  TAG          IMAGE ID       CREATED       SIZE
awiciroh/ciroh-ngen-image   latest-x86   37f60ff8da24   7 weeks ago   2.21GB

I unfortunately don't have access to an ARM machine, so probably I should go for the path of compiling NextGen directly.

Thanks for the clarification!

Ah, do you prefer that I close this issue now, or that I leave this issue open to be closed in the future, when the x86 image is available and I will be able to give a better feedback?

benlee0423 commented 9 months ago

@adlzanchetta Thank you for the feedback. We are working on getting x86 image. So, you can leave this open, and close it when the image becomes available.

arpita0911patel commented 9 months ago

@adlzanchetta , if you have the access to HPC on your side, then we would like you to try using the Singularity image as that is available. Please let us know if you would like to go try using Singularity on HPC.

adlzanchetta commented 9 months ago

@arpita0911patel Sorry the delay on getting back to you. Before answering, I was trying to ensure that I would be able to create my own Docker with all I needed. Thank you for offering, but at first I think I can go ahead with what I have now so the Singularity image may not be needed. However, I have two questions about it (please let me know if this is not the right place for this talk):

  1. Is the Singularity image already publicly available somewhere, or you would need to make it available for me?

  2. Our group created a BMI interface for another existing hydrological model and we are testing how it goes with NextGen. The model is implemented in C and the shared lib with the BMI interface is provided in a .so file. I am not an expert on Docker nor on Singularity, so my first guess is that we need to have our .so shared library compiled inside the Docker/Singularity image to have it consumed by NexGen. I.e., I would face incompatibility issues if I compile my .so file on my local Ubuntu 22.04 desktop and then bring it into a Rocky-based Docker container by mounting a volume where the .so is present (or the equivalent for a Singularity image), right?

benlee0423 commented 9 months ago

@adlzanchetta latest-x86 image is available now. You can follow NGEN instructions in your end now. If it works for you, please close this ticket.

arpita0911patel commented 9 months ago

@adlzanchetta Regarding your question, Singularity image is available publicly. Please refer this repo for more details: https://github.com/CIROH-UA/Ngen-Singularity

Regarding your question#2, @hellkite500 could answer that for you.

adlzanchetta commented 9 months ago

@benlee0423 I got to run successfully here bot in serial and parallel. Thank you!

@arpita0911patel Thank you for the indication!