deephaven / deephaven-core

Deephaven Community Core
Other
246 stars 79 forks source link

deephaven.learn's `computer.compute` function uses the variable `k`, which is unsafe for tables that are not append-only #4351

Open jjbrosnan opened 1 year ago

jjbrosnan commented 1 year ago

Description

Try using deephaven.learn on data and it will return an error:

r-Scheduler-Serial-1 | .c.ConsoleServiceGrpcImpl | Error running script: java.lang.RuntimeException: Error in Python interpreter:
Type: <class 'deephaven.dherror.DHError'>
Value: failed to complete the learn function. : RuntimeError: java.lang.IllegalArgumentException: Formula '__computer.compute(k)' uses i, ii, k, or column array variables, and is not safe to refresh. Note that some usages, such as on an append-only table are safe. To allow unsafe refreshing formulas, set the system property io.deephaven.engine.table.impl.select.AbstractFormulaColumn.allowUnsafeRefreshingFormulas.
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/table.py", line 824, in update
    return Table(j_table=self.j_table.update(*formulas))
RuntimeError: java.lang.IllegalArgumentException: Formula '__computer.compute(k)' uses i, ii, k, or column array variables, and is not safe to refresh. Note that some usages, such as on an append-only table are safe. To allow unsafe refreshing formulas, set the system property io.deephaven.engine.table.impl.select.AbstractFormulaColumn.allowUnsafeRefreshingFormulas.

This easily fixable if tables are append-only, otherwise it's no good.

Steps to reproduce

Use deephaven.learn. Ping me if you want a script.

Expected results

A new table with additional columns containing ML results.

Actual results

The error above.

Additional details and attachments

If applicable, add any additional screenshots, logs, or other attachments to help explain your problem.

Versions

devinrsmith commented 1 year ago

https://deephaven.io/core/docs/how-to-guides/use-deephaven-learn/ suggests that

For real-time tables, only the rows of the table that changed during the most recent update cycle are separated into batches and processed.

In the python library though, we create:

.update(formulas=[f"{future_offset} = __computer.compute(k)", ])

but this does not work with general updating tables.

ilyanoskov commented 4 months ago

I am experiencing this issue too. It's quite problematic, as I am not able to use learn.learn on live aggregated data

niloc132 commented 4 months ago

I believe the issue here is not simply "k isn't usable because my table isn't append-only", but "dh.learn cannot operate on tables which are not append-only" - can you modify your table setup so that only new rows are created?

chipkent commented 3 months ago

This problem has been assessed. It has been determined that the best course of action is to wait for Deephaven's multidimensional array support to be delivered. Until then, it is recommended that users formulate their AI input tables in an append-only way.

It is possible to set io.deephaven.engine.table.impl.select.AbstractFormulaColumn.allowUnsafeRefreshingFormulas=true. This would likely allow a query to run, but it may produce inaccurate results, so it is discouraged.

ilyanoskov commented 3 months ago

@chipkent thank you very much for looking into this! When are the multidimensional arrays expected to be introduced in Deephaven Core?

chipkent commented 3 months ago

The design work has been done for the feature, but no developer resources have been allocated yet.