deephaven / deephaven-core

Deephaven Community Core
Other
252 stars 80 forks source link

Columns are difficult to work with from Python #2658

Open rbasralian opened 2 years ago

rbasralian commented 2 years ago

It is tough to work with column data/column types from Python:

  1. There's no way to get a list or set of column names (e.g. t.getDefinition().getColumnNames() in Java)
  2. There's no way to get a Map of columns to column definitions or even data types (e.g. t.getDefinition().getColumnNameMap() in Java)
  3. The only way of extracting data from a column are too roundabout — we should have a method for pulling data out of columns, or just support indexing the table like myTable['column_123'] (like t.getColumn("column_123").getDirect() in Java).

We should have Python analogues for each of these — all of these cases come up frequently.

Examples for each of the above:

  1. Iterating over column names to do something, or doing something if a specific column is missing/present
  2. Converting a column if its data type is not what you want (e.g. after reading from files)
  3. Iterating over values in a column
chipkent commented 2 years ago

I agree with (1) and (2).

I do not agree on (3). Great effort has gone into providing access that is properly locked and consistent. This can currently be done using to_numpy(t, ["X"]). This is very direct and straightforward.

chipkent commented 2 years ago

Also, there is a plan for a more efficient to_numpy(), but it apparently does not yet have a ticket. It is item 54 in the python master plan.