eWaterCycle / technicalPaperExampleNotebooks

a collection of Jupyter notebooks used in the technical eWaterCycle paper
Apache License 2.0
0 stars 0 forks source link

Bugs in technical Paper Example Notebooks #39

Open vhoogelander opened 1 year ago

vhoogelander commented 1 year ago

I checked the Example Notebooks, and found some bugs:

And this in NB3:

      observations_df, metadata = ewatercycle.observation.grdc.get_grdc_data(
          station_id,
          start_time=experiment_start_date,
          end_time=experiment_end_date,
      )

I get an [Errno 5] Input/output error. (Is this related to the disk space?)

I get the following error: NoSectionError: No section: 'globalOptions'.

Peter9192 commented 1 year ago

Hi @vhoogelander thanks for opening this issue. Can you include the full error messages and provide details about the machine on which you're running these notebooks?

vhoogelander commented 1 year ago

Hi Peter, I am running the notebooks on vhoogeland2@host-192-168-0-55 (this is the machine name right?). For the first problem, I don't get any error in the NB itself, it just keeps running. These are the full error messages of the 2nd problem: NB2:

  Error                                     Traceback (most recent call last)
  Cell In[8], line 1
  ----> 1 cfg_file, cfg_dir = model.setup(end_time=experiment_end_date)
        2 print(cfg_file)
        3 print(cfg_dir)

  File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/models/wflow.py:113, in Wflow.setup(self, cfg_dir, **kwargs)
      102 def setup(self, cfg_dir: Optional[str] = None, **kwargs) -> Tuple[str, str]:  # type: ignore
      103     """Start the model inside a container and return a valid config file.
      104 
      105     Args:
     (...)
      111         Path to config file and working directory
      112     """
  --> 113     self._setup_working_directory(cfg_dir)
      114     cfg = self.config
      116     if "start_time" in kwargs:

  File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/models/wflow.py:160, in Wflow._setup_working_directory(self, cfg_dir)
      157 self.work_dir.parent.mkdir(parents=True, exist_ok=True)
      159 assert self.parameter_set
  --> 160 shutil.copytree(src=self.parameter_set.directory, dst=self.work_dir)
      161 if self.forcing:
      162     forcing_path = to_absolute_path(
      163         self.forcing.netcdfinput, parent=self.forcing.directory
      164     )

  File /opt/conda/envs/ewatercycle/lib/python3.10/shutil.py:556, in copytree(src, dst, symlinks, ignore, copy_function, ignore_dangling_symlinks, dirs_exist_ok)
      554 with os.scandir(src) as itr:
      555     entries = list(itr)
  --> 556 return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
      557                  ignore=ignore, copy_function=copy_function,
      558                  ignore_dangling_symlinks=ignore_dangling_symlinks,
      559                  dirs_exist_ok=dirs_exist_ok)

File /opt/conda/envs/ewatercycle/lib/python3.10/shutil.py:512, in _copytree(entries, src, dst, symlinks, ignore, copy_function, ignore_dangling_symlinks, dirs_exist_ok)
    510         errors.append((src, dst, str(why)))
    511 if errors:
--> 512     raise Error(errors)
    513 return dst
Error: [('/mnt/data/parameter-sets/wflow_merrimack_techpaper/inmaps/wflow_ERA5_Merrimack_2001_2016.nc', '/home/vhoogeland2/technicalPaperExampleNotebooks/ewatercycle_output/wflow_20230809_130831/inmaps/wflow_ERA5_Merrimack_2001_2016.nc', "[Errno 5] Input/output error: '/mnt/data/parameter-sets/wflow_merrimack_techpaper/inmaps/wflow_ERA5_Merrimack_2001_2016.nc' -> '/home/vhoogeland2/technicalPaperExampleNotebooks/ewatercycle_output/wflow_20230809_130831/inmaps/wflow_ERA5_Merrimack_2001_2016.nc'"), 
  ........ VERY LONG MESSAGE ........,
  '/home/vhoogeland2/technicalPaperExampleNotebooks/ewatercycle_output/wflow_20230809_130831/staticmaps/wflow_uparea.map', '[Errno 5] Input/output error')]

NB3:

OSError                                   Traceback (most recent call last)
Cell In[6], line 1
----> 1 observations_df, metadata = ewatercycle.observation.grdc.get_grdc_data(
      2     station_id,
      3     start_time=experiment_start_date,
      4     end_time=experiment_end_date,
      5 )
      6 grdc_obs = observations_df.rename(columns={"streamflow": "Observations from GRDC"})
      7 grdc_lon = metadata["grdc_longitude_in_arc_degree"]

File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/observation/grdc.py:107, in get_grdc_data(station_id, start_time, end_time, parameter, data_home, column)
    104     raise ValueError(f"The grdc file {raw_file} does not exist!")
    106 # Convert the raw data to an xarray
--> 107 metadata, df = _grdc_read(
    108     raw_file,
    109     start=get_time(start_time).date(),
    110     end=get_time(end_time).date(),
    111     column=column,
    112 )
    114 # Add start/end_time to metadata
    115 metadata["UserStartTime"] = start_time

File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/observation/grdc.py:129, in _grdc_read(grdc_station_path, start, end, column)
    127 def _grdc_read(grdc_station_path, start, end, column):
    128     with grdc_station_path.open("r", encoding="cp1252", errors="ignore") as file:
--> 129         data = file.read()
    131     metadata = _grdc_metadata_reader(grdc_station_path, data)
    133     all_lines = data.split("\n")

OSError: [Errno 5] Input/output error

And the 3rd problem:

NoSectionError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 reference = ewatercycle.models.PCRGlobWB(version="setters", parameter_set=experiment_parameterset)
      3 reference_config, reference_dir = reference.setup(
      4     start_date = experiment_start_date, 
      5     end_date = experiment_end_date)
      7 print(reference_config, reference_dir)

File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/models/pcrglobwb.py:47, in PCRGlobWB.__init__(self, version, parameter_set, forcing)
     45 super().__init__(version, parameter_set, forcing)
     46 self._set_docker_image()
---> 47 self._setup_default_config()

File /opt/conda/envs/ewatercycle/lib/python3.10/site-packages/ewatercycle/models/pcrglobwb.py:81, in PCRGlobWB._setup_default_config(self)
     79 cfg = CaseConfigParser()
     80 cfg.read(config_file)
---> 81 cfg.set("globalOptions", "inputDir", str(input_dir))
     82 if self.forcing:
     83     cfg.set(
     84         "globalOptions",
     85         "startTime",
     86         get_time(self.forcing.start_time).strftime("%Y-%m-%d"),
     87     )

File /opt/conda/envs/ewatercycle/lib/python3.10/configparser.py:1205, in ConfigParser.set(self, section, option, value)
   1202 """Set an option.  Extends RawConfigParser.set by validating type and
   1203 interpolation syntax on the value."""
   1204 self._validate_value_types(option=option, value=value)
-> 1205 super().set(section, option, value)

File /opt/conda/envs/ewatercycle/lib/python3.10/configparser.py:903, in RawConfigParser.set(self, section, option, value)
    901         sectdict = self._sections[section]
    902     except KeyError:
--> 903         raise NoSectionError(section) from None
    904 sectdict[self.optionxform(option)] = value

NoSectionError: No section: 'globalOptions'
Peter9192 commented 1 year ago

Hi @vhoogelander, actually I meant whether it's a research cloud machine. I'm guessing it's this one, right? https://ewatercyclestud.ewatercycle-tud.src.surf-hosted.nl

Yes, on that machine it looks like the /home volume is full. That might explain the problem with NB2. NB3 looks different though, it's just reading, not copying.

Also it would be helpful if you could refer to the names of each of the notebooks (and where you got them from). Now I cannot really figure out which ones you have been running. It would be even better if you could reduce the problem to a minimal example and copy/paste the code here so we can reproduce it easily.

For now, I'll come back with a quick response.

import ewatercycle.observation.grdc
grdc_station_id = "6335020"

observations, metadata = ewatercycle.observation.grdc.get_grdc_data(
    station_id=grdc_station_id,
    start_time="1990-01-01T00:00:00Z",  # or: model_instance.start_time_as_isostr
    end_time="1990-12-15T00:00:00Z",
    column="GRDC",
)

observations.head()

that worked without problems. Can you be more specific about what notebook/station ID etc you were using?

As you see, it would be helpful if you could be more specific about the issues you encountered.

On a side note: I did notice that the link to the example notebooks on the terria landing page is outdated. It currently points to link, but that no longer exists. We might need to pin it to a release or bring back the example notebooks in some other way. I'll open a new issue about that.

vhoogelander commented 1 year ago

Hi @Peter9192, Thank you for your comment. I am referring to the technical paper notebooks (Case1_Marrmot_Merrimack..., Case2_wflow_LISFlood..., Case3_CoupleMarrmotAndPCRGlobWB and Case4_ForcePCRGlob) which I got via the terria landing page more than a year ago.

1) If I restart my server, the problem remains. Or is this not you mean with starting a new machine? If not, how can I do this? (maybe a stupid question)

2) I tried to clean up my own folder a bit, but the error of problem 2 remains. What is the maximum disk space of my home directory?

3) I re-ran the cell of NB3, but apparently I'm not getting an error anymore here for some reason. I was using the same station ID (6335020), so I am not really sure what was the problem here, but it seems to be fixed now ;).

4) I am loading this parameter set: name=pcrglobwb_merrimack_05min directory=/mnt/data/parameter-sets/pcrglobwb_global config=/mnt/data/parameter-sets/pcrglobwb_global/merrimack_05min_era5.ini I think this was the original dataset used in the Example Notebook, but I am not 100% sure.

Peter9192 commented 1 year ago

It's not the jupyter server, it's the SURF research cloud machine (https://portal.live.surfresearchcloud.nl/) that should be updated (or make a new one). @RolfHut knows how to do this.

The parameter set does have a globalOptions section, but I got a similar input/output error when I first tried to open it. It seems the disks were even fuller today than yesterday. The /home disk is 250GB in total shared by all users on that machine. I won't details here, but it looks like a few heavy users are taking up most of the available disk space.

sverhoeven commented 1 year ago

I noticed that the dcache server that gives us files in /mnt/data was having hickups and timeouts. This could cause weird file reading behavior.