ecmwf / anemoi-training

Apache License 2.0
13 stars 15 forks source link

During training, obtaining a Resolution of None Against an Anemoi Data Processed ERA5 Zarr #68

Open CSyl opened 4 weeks ago

CSyl commented 4 weeks ago

What happened?

I have a zarr that is the subset of the ERA5 zarr in the gcp storage, https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr. When running anemoi-training train & anemoi-training train --config-name=debug.yaml, I encountered the following error:

"AttributeError: "NoneType" object has no attribute 'lower'" from the anemoi/training/data/datamodule.py line 101 in __checkresolution.

If the resolution must be set within the configuration file, is there a way to verify what the resolution of the https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr is in terms of the "o" resolution or in whatever prefix term(s) the anemoi-training module will accept as shown in the anemoi/training/config/data/zarr.yaml file so, that I can set the resolution in the configuration file for training - perhaps that will remove this error of "NoneType" that I am recieving?

What are the steps to reproduce the bug?

Data & Graph used: Using a anemoi-formatted ERA5 zarr subset extracted from https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr and preprocessed using the anemoi-datasets module & created a graph against the anemoi-formatted ERA5 zarr subset generated using the anemoi-graphs module.

1) Configuration file (config.yaml) I am using for training module is:

defaults:
- data: zarr
- dataloader: native_grid
- diagnostics: eval_rollout
- hardware: example
- graph: multi_scale
- model: gnntransformer
- training: default
- _self_

2) Data Configuration file (anemoi/training/config/data/zarr.yaml) for training module I am using is:

format: zarr
resolution: o384 #o96
# Time frequency requested from dataset
frequency: 1h #6h
# Time step of model (must be multiple of frequency)
timestep: 1h #6h

# features that are not part of the forecast state
# but are used as forcing to generate the forecast state
forcing:
- "cos_latitude"
- "cos_longitude"
- "sin_latitude"
- "sin_longitude"
- "cos_julian_day"
- "cos_local_time"
- "sin_julian_day"
- "sin_local_time"
- "insolation"
- "lsm"
- "sdor"
- "slor"
- "z"
# features that are only part of the forecast state
# but are not used as the input to the model
diagnostic:
- tp
- cp
remapped:

normalizer:
  default: "mean-std"
  min-max:
  max:
  - "sdor"
  - "slor"
  - "z"
  none:
  - "cos_latitude"
  - "cos_longitude"
  - "sin_latitude"
  - "sin_longitude"
  - "cos_julian_day"
  - "cos_local_time"
  - "sin_julian_day"
  - "sin_local_time"
  - "insolation"
  - "lsm"

imputer:
  default: "none"
remapper:
  default: "none"

# processors including imputers and normalizers are applied in order of definition
processors:
  # example_imputer:
  #   _target_: anemoi.models.preprocessing.imputer.InputImputer
  #   _convert_: all
  #   config: ${data.imputer}
  normalizer:
    _target_: anemoi.models.preprocessing.normalizer.InputNormalizer
    _convert_: all
    config: ${data.normalizer}
  # remapper:
  #   _target_: anemoi.models.preprocessing.remapper.Remapper
  #   _convert_: all
  #   config: ${data.remapper}

# Values set in the code
num_features: null # number of features in the forecast state

3) Dataloader Configuration file (anemoi/training/config/dataloader/native_grid.yaml) for training module I am using is:

prefetch_factor: 2

num_workers:
  training: 8
  validation: 8
  test: 8
  predict: 8
batch_size:
  training: 2
  validation: 4
  test: 4
  predict: 4

# ============
# Default effective batch_size for training is 16
# For the o96 resolution, default per-gpu batch_size is 2 (8 gpus required)
# The global lr is calculated as:
# global_lr = local_lr * num_gpus_per_node * num_nodes / gpus_per_model
# Assuming a constant effective batch_size, any change in the per_gpu batch_size
# should come with a rescaling of the local_lr to keep a constant global_lr
# ============

# runs only N training batches [N = integer | null]
# if null then we run through all the batches
limit_batches:
  training: null
  validation: null
  test: 20
  predict: 20

# ============
# Dataloader definitions
# These follow the anemoi-datasets patterns
# You can make these as complicated for merging as you like
# See https://anemoi-datasets.readthedocs.io
# ============

dataset: ${hardware.paths.data}/${hardware.files.dataset}

training:
  dataset: ${dataloader.dataset}
  start: 2020-12-31 00:00:00 #null
  end: 2021-01-20 23:00:00 #2021
  frequency: ${data.frequency}
  drop:  []

validation:
  dataset: ${dataloader.dataset}
  start: 2021-01-21 00:00:00 #2021
  end: 2021-01-24 23:00:00 #2021
  frequency: ${data.frequency}
  drop:  []

test:
  dataset: ${dataloader.dataset}
  start: 2021-01-25 00:00:00 #2021
  end: 2021-02-01 23:00:00 #null
  frequency: ${data.frequency}

4) Hardware Data Configuration I am using is:

data: /Location of where the anemoi-formatted zarr is saved within was add here
grids: ???
output: /Location of where to save the training log was added here
logs:
  base: ${hardware.paths.output}logs/
  wandb: ${hardware.paths.logs.base}
  mlflow: ${hardware.paths.logs.base}mlflow/
  tensorboard: ${hardware.paths.logs.base}tensorboard/
checkpoints: ${hardware.paths.output}checkpoint/
plots: ${hardware.paths.output}plots/
profiler: ${hardware.paths.output}profiler/
graph: ${hardware.paths.output}graphs/

5) Configuration file for anemoi-graphs module I am using is:

# Encoder-Processor-Decoder graph
# Note: Resulting graph will only work with a Transformer processor because there are no connections between the hidden nodes.
nodes:
  data:
    node_builder: # how to generate data node
      _target_: anemoi.graphs.nodes.ZarrDatasetNodes
      dataset: anemoi-local-gcp-sample-zarr.zarr
  hidden:
    node_builder: # how to generate hidden node
      _target_: anemoi.graphs.nodes.ZarrDatasetNodes
      dataset: anemoi-local-gcp-sample-zarr.zarr
edges:
  # A) Encoder connections/edges: Encodes input data intolatent space via connecting data nodes w/ hidden nodes.
  - source_name: data
    target_name: hidden
    edge_builder:
      _target_: anemoi.graphs.edges.CutOffEdges # method to build edges 
      cutoff_factor: 0.7
  # B) Decoder connections/edges: Decodes latent space into the output data via connecting hidden nodes w/ data nodes 
  - source_name: hidden
    target_name: hidden
    edge_builder:
      _target_: anemoi.graphs.edges.KNNEdges # method to build edges via KNN
      num_nearest_neighbours: 3
 # C) Processor connections/edges
  - source_name: hidden
    target_name: data
    edge_builder:
      _target_: anemoi.graphs.edges.KNNEdges  # method to build edges via KNN
      num_nearest_neighbours: 3

5) Executed anemoi-training train --config-name=config.yaml & obtained the error:

AttributeError: "NoneType" object has no attribute 'lower' " from the anemoi/training/data/datamodule.py line 101 in _check_resolution.

Version

0.1.0

Platform (OS and architecture)

Linux

Relevant log output

No response

Accompanying data

No response

Organisation

No response

(ccing @mchantry )

mchantry commented 18 hours ago

Hi CSyl Sorry for the slow reply. Could you provide access to a small Anemoi dataset -style zarr so we can understand how the resolution has been described in the dataset. Or a config for anemoi datasets on how you have built the zarr. Thanks