ContinuumIO / elm

Phase I & part of Phase II of NASA SBIR - Parallel Machine Learning on Satellite Data
http://ensemble-learning-models.readthedocs.io
44 stars 23 forks source link

Provide more control over sample flattening #44

Closed PeterDSteinberg closed 8 years ago

PeterDSteinberg commented 8 years ago

Currently all samples go from a shape of ( band, y, x) to (space, band). We should make {flatten: True} be a step in a sample_pipeline explicitly rather than calling it automatically. Some methods may actually want to flatten it to (lat points, lon points * bands) and breaking flatten out as an explicit step will allow more control

PeterDSteinberg commented 8 years ago

This was partially addressed in PR #53 which added a bands_as_columns decorator. Further consideration may be required for data_sources which have more than 3 dimensions, e.g. (band, y, x, z) or (band, y, x, z, time) instead of (band, y, x).

PeterDSteinberg commented 8 years ago

The comment above relates also to #54 as NetCDF often have higher dimensionality.

PeterDSteinberg commented 8 years ago

Just discussed this with @brendancol . We are going to provide in the data_sources section of config a place to specify the dimensions and canvas of the input data to be derived from files, allowing for dimensions of band, y, x, z, time and various common patterns of flattening or reducing that data.