MAAP-Project / gedi-subsetter

MAAP DPS algorithm for subsetting GEDI data
Apache License 2.0
7 stars 0 forks source link

Automatically "spread" user-selected 2D datasets into multiple columns #48

Open chuckwondo opened 1 year ago

chuckwondo commented 1 year ago

Currently, when a user wants columns from a 2D dataset, each column must be explicitly indexed in the columns input value. For example, given a 2D dataset named xvar with n columns, if the user wishes to have all n columns appear in the output file, the columns input value must include xvarX for every X in the range 0 through n - 1. For example, if xvar has 4 columns, and the user wants all of them in the output, then xvar0,xvar1,xvar2,xvar3 must be included in the columns input. For a small number of columns, this is acceptable, but when the 2D dataset contains more than a handful of columns, this is tedious, error-prone, and inconvenient.

To make it easy for users to automatically get all columns of a 2D dataset in the output, we should support a shorthand notation within the columns input. I propose that we support the syntax *VAR as a column name, where VAR is the name of a 2D dataset. For example, continuing from above, if *xvar is part of the columns input, we should automatically "spread" this into xvar0,xvar1,xvar2,xvar3, just as if the user had included such an expanded form in the columns input value to begin with. This syntax is consistent with the Python syntax for iterable unpacking.

However, since we don't know in advance how many columns a 2D dataset contains, this can be resolved by first implementing #47 with one of the proposed approaches, because both approaches specify a means to readily determine the number of columns in any supported 2D dataset.

chuckwondo commented 1 week ago

After discussion with Abigail Barenblitt, we landed on using the Python slice syntax: xvar[a:b]

This would allow users even better flexibility, such that a user can specify a specific start index (a, 0-based) and stop index (b, exclusive), in case the user does not want all columns of xvar (for example).

Further, to select all columns, the syntax would be xvar[:], again, just like Python slice syntax.