enso-org / enso

Hybrid visual and textual functional programming.
https://ensoanalytics.com
Apache License 2.0
7.36k stars 324 forks source link

Remove overhead of calls from Table java code into Enso code by refactoring the functionality to Enso #6292

Open Akirathan opened 1 year ago

Akirathan commented 1 year ago

There is "Table.order_by object" benchmark that creates a table consisting solely of My atoms with custom My_Comparator and most of the time is spent in ObjectComparator.ensoCompare which calls back into Enso from Java across a boundary.

The simplest, and quickest possible solution to speed up the performance is to move some of the functionality, that is currently implemented in org.enso.base.table Java package into Enso such that these kinds of callbacks are no longer necessary.

After moving the functionality to Enso, it is possible that there may not be a need for a shared code between libs and runtime anymore (#5259).

radeusgd commented 1 year ago

This should include moving the callback part of the MultiValueIndex and other MultiValueKey methods to Enso too, so that we avoid all Java-to-Enso callbacks in the Table library.

radeusgd commented 1 year ago

Once we move the MultiValueIndex to Enso, we should implement a table.is_unique columns which can be used for a more efficient check of primary_key uniqueness condition in Upload_Table.

radeusgd commented 1 year ago

First steps towards this done in #6890

jdunkerley commented 1 year ago

PR #7270

jdunkerley commented 1 year ago

This is on hold and should be tackled as we work with the storage refactor. If we move to Apache Arrow this could be essential.