Example like "Hurricane Teleconnection Locator: HURTLoc" (Bowen et al)

Some ideas on the config for a hurricane track classifier like "Hurricane Teleconnection Locator: HURTLoc" by Michael Bowen.

Data source (configure for the MERRA-2 training X data):

"reader": the default NetCDF reader
sample_args_generator: return a series of lon, lat, time subgrid args which are marked with hurricane class: 1/0 or scale 1 to 5
sample_from_args_func: take lon lat boxes and time windows- open netcdf files for each window / box - return average of each band for that query for each X-hour period, such as 6-hour chunks like the Y data.

sample_pipeline pseudo-code:

get_y: get the Y data (track data) for the lon, lat boxes and time windows and shape it into 1 column of the same shape as X's row count. Classes 0 / 1 or storm scale integer for in a grid of (lon, lat, time).
{modify_coords: some_module:normalize} in sample_pipeline: Determine Gaussian breakpoints of hourly data for each variable (band) within each storm lon, lat box and time window, such as breakpoints among about 150 to 200 hourly data points for a given lon, lat point. Define 5 histogram classes in this normalization, setting the mean to 0 within the ca. 150-200 hour period for each storm y, x point. (setting the alphabet size 5). Return a matrix that gives for each 6-hour chunk of time for each spatial point the anomaly class among 0,1,2,3,4 (alphabet size 5). For each band (weather variable), a grid (lon, lat, time [6 hour chunk]) of integers 0 through 4.
{modify_coords: some_module:reshape_3d_2d} sample_pipeline action: Now the ElmStore has many bands (weather variables) and a (lon, lat, time) structure for each band. Flatten the time dimension into the band to create separate data arrays. Example: If the bands were:

temp
precip
humid

And the time points were:

Then make an ElmStore with "bands" (data_vars) as:

temp-0
temp-1
temp-2
temp-3
precip-0
...
humid-2
humid-3

And then for each band, there is a (y, x) grid

call {"flatten": "C"} in the sample_pipeline: With flatten we will take the "temp-0", "temp-1",.."humid-2" data_vars and turn them into column headers in a single matrix. Returns ElmStore with one DataArray called "flat". The "flat" DataArray will have column headers "temp-0", etc as the "band" attribute and the rows will be all the y, x points flattened into a single dimension called "space" (ravel order "C")
Other steps here such as scikit-learn preprocessing, feature selection, PCA (not mentioned in paper)

train:

ContinuumIO / elm