MetOffice / CSET

Toolkit for evaluation and investigation of numerical models for weather and climate applications.
https://metoffice.github.io/CSET/
Apache License 2.0
8 stars 1 forks source link

Unable to read in data in py3.12 environment with CLI #657

Closed daflack closed 1 month ago

daflack commented 1 month ago

Describe the bug

Unable to plot data that is in um fields files format using the command line. Gives the following error:

ValueError: No format specification could be found for the given buffer. Perhaps a plugin is missing or has not been loaded. File element cache:
 {'UriProtocol()': 'file', 'LeadingLine()': "b'2024-05-28 16:38:10,028 INFO operator: read.read...", 'MagicNumber(4, None)': '842019380', 'MagicNumber(8, None)': '3616445700456330541', 'DataSourceObjectProtocol()': 'CSET.log', 'FileExtension()': '.log', 'MagicNumber(100, None)': "b'2024-05-28 16:38:10,028 INFO operator: read.read..."}

How to reproduce

Steps to reproduce the behaviour:

  1. run cset bake on command line pointing to a directory that contains um fields files

Expected behaviour

It reads in um fields file.

Environment

daflack commented 1 month ago

Update: appears to have problems with pp files and netcdf as well

daflack commented 1 month ago

Happens in 3.11 CLI environment too - not sure where it is happening though. Will edit recipe and see if I can find out if it is the parallel or collate step where it is having problems.

daflack commented 1 month ago

further investigation has revealed the error appears to be in the collate step - so reading in the files created in the write step at the end of the parallel step

daflack commented 1 month ago

reproducible with multiple people and versions

jfrost-mo commented 1 month ago

Traceback:

❯ cset bake -i . -o . -r ~/r.yml --collate-only
Traceback (most recent call last):
  File "/tmp/persistent/conda/envs/cset/bin/cset", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/net/home/h02/jfrost/Projects/CSET/src/CSET/__init__.py", line 160, in main
    args.func(args, unparsed_args)
  File "/net/home/h02/jfrost/Projects/CSET/src/CSET/__init__.py", line 209, in _bake_command
    execute_recipe_collate(
  File "/net/home/h02/jfrost/Projects/CSET/src/CSET/operators/__init__.py", line 270, in execute_recipe_collate
    _run_steps(recipe, steps, output_directory, output_directory, style_file)
  File "/net/home/h02/jfrost/Projects/CSET/src/CSET/operators/__init__.py", line 164, in _run_steps
    step_input = _step_parser(step, step_input)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/home/h02/jfrost/Projects/CSET/src/CSET/operators/__init__.py", line 139, in _step_parser
    return operator(step_input, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/home/h02/jfrost/Projects/CSET/src/CSET/operators/read.py", line 138, in read_cubes
    cubes = iris.load(input_files, constraint, callback=callback)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/persistent/conda/envs/cset/lib/python3.12/site-packages/iris/__init__.py", line 326, in load
    return _load_collection(uris, constraints, callback).merged().cubes()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/persistent/conda/envs/cset/lib/python3.12/site-packages/iris/__init__.py", line 294, in _load_collection
    result = _CubeFilterCollection.from_cubes(cubes, constraints)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/persistent/conda/envs/cset/lib/python3.12/site-packages/iris/cube.py", line 97, in from_cubes
    for cube in cubes:
  File "/tmp/persistent/conda/envs/cset/lib/python3.12/site-packages/iris/__init__.py", line 275, in _generate_cubes
    for cube in iris.io.load_files(part_names, callback, constraints):
  File "/tmp/persistent/conda/envs/cset/lib/python3.12/site-packages/iris/io/__init__.py", line 212, in load_files
    handling_format_spec = FORMAT_AGENT.get_spec(os.path.basename(fn), fh)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/persistent/conda/envs/cset/lib/python3.12/site-packages/iris/io/format_picker.py", line 150, in get_spec
    raise ValueError(msg)
ValueError: No format specification could be found for the given buffer. Perhaps a plugin is missing or has not been loaded. File element cache:
 {'UriProtocol()': 'file', 'LeadingLine()': "b'2024-05-30 09:59:15,475 INFO operator: read.read...", 'MagicNumber(4, None)': '842019380', 'MagicNumber(8, None)': '3616445700456330541', 'DataSourceObjectProtocol()': 'CSET.log', 'FileExtension()': '.log', 'MagicNumber(100, None)': "b'2024-05-30 09:59:15,475 INFO operator: read.read..."}
jfrost-mo commented 1 month ago

The following command works fine, even though its running the same code:

python3 -c 'from CSET.operators.read import read_cubes; print(read_cubes("intermediate/"))'
jfrost-mo commented 1 month ago

This does not:

cset bake -i . -o . -r r.yml --collate-only

r.yml:

title: Test Recipe

parallel:
  - operator: misc.noop

collate:
  - operator: read.read_cubes
    filename: intermediate/*
jfrost-mo commented 1 month ago

The argument to read.read_cubes should actually be filename_pattern. Using this avoids the issue. Without it we effectively run

python3 -c 'from CSET.operators.read import read_cubes; print(read_cubes(".", filename="intermediate/"))'

which tries to read the non-data files in the folder, which it cannot parse.