NOAA-OWP / ngen-cal

Tools for calibrating and configuring NextGen
https://github.com/NOAA-OWP/ngen-cal/wiki
Other
9 stars 16 forks source link

Initial run attempting to work out of temporary worker directory instead of workdir provided in calibration config file - Causing failed execution. #85

Open Ben-Choat opened 10 months ago

Ben-Choat commented 10 months ago

Short description explaining the high-level reason for the new issue.

Current behavior

On exectuing ngen-cal, I received the below traceback ending with an error suggesting pandas merge function was being applied to an object of NoneType. The error was produced in cal/search.py, line 25 in _objective_func when the following was executed: pd.merge(simulated_hydrograph, observed_hydrograph, left_index=True, right_index=True).

o Traceback (most recent call last): o File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main o return _run_code(code, main_globals, None, o File "/usr/lib/python3.10/runpy.py", line 86, in _run_code o exec(code, run_globals) o File "/home/west/git_repositories/ngen_20231127_calib/ngen/venv/lib/python3.10/site-packages/ngen/cal/main.py", line 87, in o main(general, conf['model']) o File "/home/west/git_repositories/ngen_20231127_calib/ngen/venv/lib/python3.10/site-packages/ngen/cal/main.py", line 63, in main o func(start_iteration, general.iterations, agent) o File "/home/west/git_repositories/ngen_20231127_calib/ngen/venv/lib/python3.10/site-packages/ngen/cal/search.py", line 190, in dds_set o _evaluate(0, calibration_set, info=True) o File "/home/west/git_repositories/ngen_20231127_calib/ngen/venv/lib/python3.10/site-packages/ngen/cal/search.py", line 56, in _evaluate o score = _objective_func(calibration_object.output, calibration_object.observed, calibration_object.objective, calibration_object.evaluation_range) o File "/home/west/git_repositories/ngen_20231127_calib/ngen/venv/lib/python3.10/site-packages/ngen/cal/search.py", line 25, in _objective_func o df = pd.merge(simulated_hydrograph, observed_hydrograph, left_index=True, right_index=True) o File "/home/west/git_repositories/ngen_20231127_calib/ngen/venv/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 74, in merge o op = _MergeOperation( o File "/home/west/git_repositories/ngen_20231127_calib/ngen/venv/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 593, in init o _left = _validate_operand(left) o File "/home/west/git_repositories/ngen_20231127_calib/ngen/venv/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 2066, in _validate_operand o raise TypeError( o TypeError: Can only merge Series or DataFrame objects, a <class 'NoneType'> was passed

Expected behavior

The code runs, producing an automated approach to calibration. On initial run, the working directory is workdir as defined in the calibration configuration file.

Steps to replicate behavior (include URLs)

  1. In Ubuntu 22.04

  2. use build_ngen_calib.sh in the attached zip folder to build ngen and set up ngen-cal. You may wish to edit the file to specifcy where ngen is built, for exmample.

  3. create a symlink in the attached folder to the ngen folder after it is built.

  4. Run ngen-cal with python -m ngen.cal calib_config_CAMELS_CFE_Calib_Sep_2.yaml

This should run, but let me know when it doesn.t.

Proposed solution

After scouring through the code, I found that JobMeta() in cal/meta.py, takes both an argument for parent_workdir, and workdir, def __init__(self, name: str, parent_workdir: Path, workdir: Path=None, log=False):

But when JobMeta was called from the Agent() class, self._job was None, triggering the following call to JobMeta with only a value for parent_workdir provided. https://github.com/NOAA-OWP/ngen-cal/blob/master/python/ngen_cal/src/ngen/cal/agent.py#L80 self._job = JobMeta(model_conf['type'], workdir, log=log)

So, workdir was being passed to JobMeta() as parent_workdir, and workdir was being passed as None (the default value), which triggered the xxx_worker directory to be the main working directory.

By providing workdir twice to JobMeta, the calibration seems to be running as expected. self._job = JobMeta(model_conf['type'], workdir, workdir, log=log)

Screenshots

RecreateIssue.zip

hellkite500 commented 8 months ago

Forgot this was issue was referenced with #88 when I wrote my comment there. I need to dive a little deeper into these intended semantics and make sure there isn't a deep bug somewhere based on your issue.