deephaven / deephaven-core

Deephaven Community Core
Other
257 stars 80 forks source link

Calling Head_By After Format_Column Fails with Exception #4501

Closed stanbrub closed 1 year ago

stanbrub commented 1 year ago

Chaining operations after _formatcolumns works for some operations but not _headby. We need to decide if we are going to support chaining on _formatcolumns or only allow it on a leaf table. If not allowed, errors should be explicit. If allowed, column formatting should be carried along through subsequent tables. (Easier said than done.)

Run the following code snippet (Tested in Deephaven 0.28.0+)

from deephaven import new_table
from deephaven.column import string_col, double_col

result = new_table([
    double_col("Doubles", [3.1, 5.45, -1.0]),
    string_col("Strings", ["Creating", "New", "Tables"])
])

result = result.format_columns(["Doubles=Decimal(`##0.00%`)"]).head_by(1)

You will get an exception like the following:

Type: <class 'deephaven.dherror.DHError'>
Value: table head_by operation failed. : RuntimeError: io.deephaven.engine.table.impl.NoSuchColumnException: Unknown column names [Doubles__TABLE_NUMBER_FORMAT], available column names are [Doubles,Strings]
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/table.py", line 1487, in head_by
    return Table(j_table=self.j_table.headBy(num_rows, *by))
RuntimeError: io.deephaven.engine.table.impl.NoSuchColumnException: Unknown column names [Doubles__TABLE_NUMBER_FORMAT], available column names are [Doubles,Strings]

If you use head_by(1, ['Strings']) or head_by(1, ['Doubles','Strings']), you get essentially the same error

rcaudy commented 1 year ago

The reason for this issue is that aggAllBy-style aggregations drop formatting columns, but headBy and tailBy are actually implemented (sub-optimally) as the composition of multiple operations.

See io.deephaven.engine.table.impl.QueryTable#headOrTailBy for the implementation, but the gist of it is that it's writing down a set of update expressions for all columns naively (before the formatting columns are dropped), and then doing:

        return groupBy(groupByColumns).updateView(updates).ungroup().updateView(casting);

The appropriate solution is to drop the formatting columns (see Table#dropColumnFormats) before computing the update or cast expressions, or to filter formatting columns (see io.deephaven.engine.util.ColumnFormatting#isFormattingColumn) during that computation.

rcaudy commented 1 year ago

@malhotrashivam pointed out that the reporter is actually hitting a separate bug in aggAllBy, which is preventing hitting the bug I explained in my previous comment. We're going to:

  1. Fix aggAllBy to actually do what it intends.
  2. Change headOrTailBy to explicitly building group aggregations so that it can group the formatting columns and preserve them.