Training machine learning (ML) models require access to the grid information of the domain, wind rotation matrix parameters and land_sea_mask data. The data are stored in .zarr format with the locations to the datasets being specified in the catalog.yaml file within vcm directory. The paths are given in the form of "gs"-strings - the style of the path strings used on the cloud platform. This pull request introduces the code modifications needed for performing ML training on supercomputing systems.
Significant internal changes:
Added catalog_path attribute to the BatchesFromMapperConfig dataclass (_batch.py script).
Refactored load_batches method of BatchesFromMapperConfig dataclass (_batch.py script).
Refactored batches_from_mapper function (_batch.py script).
Added catalog_path parameter to add_grid_info andadd_rotation_infofunctions (_utils.py` script)
Refactored _load_grid and _load_wind_rotation_matrix functions (_utils.py script).
Training machine learning (ML) models require access to the grid information of the domain, wind rotation matrix parameters and land_sea_mask data. The data are stored in
.zarr
format with the locations to the datasets being specified in thecatalog.yaml
file withinvcm
directory. The paths are given in the form of "gs"-strings - the style of the path strings used on the cloud platform. This pull request introduces the code modifications needed for performing ML training on supercomputing systems.Significant internal changes:
catalog_path
attribute to theBatchesFromMapperConfig
dataclass (_batch.py
script).load_batches
method ofBatchesFromMapperConfig
dataclass (_batch.py
script).batches_from_mapper
function (_batch.py
script).catalog_path
parameter toadd_grid_info and
add_rotation_infofunctions (
_utils.py` script)_load_grid
and_load_wind_rotation_matrix
functions (_utils.py
script).Resolves #2204 (partially)