EarlyStopping is currently not working because I haven't created a train/validation test set
Create xy samples dynamically from Data loaded into memory
sorry this is a huge PR where we have basically re-written the Engineer/DataLoaders/Models to work with data loaded into memory. Better for hard disk constrained modelling problems where the size of the seq_length is larger (e.g. 365 daily timesteps as input to the LSTM models).
Use the Pipeline for working with runoff data.
data is 2D instead of 3D (station_id, time)
data is on smaller timesteps than monthly (daily)
create dynamic engineer
create dynamic dataloader
update the EALSTM / Neural Networks to work with DynamicDataLoaders
new arguments to models = 'seq_length', 'target_var', 'forecast_horizon'
overview: update all rmse/r2 functions to calculate spatial scores (score for each spatial unit) and temporal scores (time series of each station)
add more catching of the inversion problem (turns out it occurs when the order of lat, lon is reversed -> lon, lat
Engineer updates
Create new engineer OneTimestepForecast - src/engineer/one_timestep_forecast.py
Created a new DynamicEngineer for use with the DynamicDataLoader
NOTE do we want this or do we ideally want to generalise the one_month_forecast?
Major difference is collapsing things not by lat, lon but by dimension_name = [c for c in static_ds.coords][0]
DataLoader Updates
self.get_reducing_dims to get the spatial dimensions (either latlon or area or station_id or whatever is not time!)
aggregations collapse over these reducing dimensions
global_mean = x.mean(dim=reducing_dims)
build_loc_to_idx_mapping building a dictionary to ensure we can track what id relates to what spatial unit
Various examples of if len(static_np.shape) == 3: having to account for 2D spatial information (time, lat, lon) or 1D spatial information (time, station_id)
TODO:
# TODO: why so many static nones?
This is because the standard deviation of some of the values, stored in the normalizing_dict become 0, so dividing by 0 we get np.nan
Model updates
seq_length // include_timestep_aggs
use a dataloader for the load in timesteps for x, y in tqdm.tqdm(train_dataloader):
include_monthly_aggs -> include_timestep_aggs = spatial aggregation (map of mean values for that pixel)
NOTE:
Create xy samples dynamically from Data loaded into memory
sorry this is a huge PR where we have basically re-written the Engineer/DataLoaders/Models to work with data loaded into memory. Better for
hard disk constrained
modelling problems where the size of theseq_length
is larger (e.g. 365 daily timesteps as input to the LSTM models).Use the Pipeline for working with runoff data.
station_id, time
)daily
)'seq_length', 'target_var', 'forecast_horizon'
We have created an experiment file for running the OneTimestepForecast Runoff modelling: scripts/experiments/18_runoff_init.py
Analysis updates
We have added some updates to the analysis code:
lat, lon
is reversed ->lon, lat
Engineer updates
OneTimestepForecast
-src/engineer/one_timestep_forecast.py
lat, lon
but bydimension_name = [c for c in static_ds.coords][0]
DataLoader Updates
self.get_reducing_dims
to get the spatial dimensions (either latlon or area or station_id or whatever is not time!)global_mean = x.mean(dim=reducing_dims)
build_loc_to_idx_mapping
building a dictionary to ensure we can track what id relates to what spatial unitif len(static_np.shape) == 3:
having to account for 2D spatial information (time, lat, lon) or 1D spatial information (time, station_id)TODO:
# TODO: why so many static nones?
Model updates
seq_length
//include_timestep_aggs
for x, y in tqdm.tqdm(train_dataloader):
include_monthly_aggs
->include_timestep_aggs
= spatial aggregation (map of mean values for that pixel)