deephaven / deephaven-docs-community

Source code for Community docs on the deephaven.io website.
Apache License 2.0
0 stars 5 forks source link

Performance > Tick amplification #162

Closed margaretkennedy closed 3 months ago

margaretkennedy commented 5 months ago

Here is an example of tick amplification. The input time table ticks once. When this is passed into a group/ungroup, the query engine does not know which cell changed, so the entire array is marked as modified, and large sections of the output table change. If your real-time queries are not careful, you could potentially have scenarios where one small change causes huge amounts of things to be recomputed -- one tick gets amplified. If you look at partition_by/merge, it doesn't suffer from the same problem. It is why you may want to use it in some cases. Either way, tick amplification is something to keep in mind for performance analysis.

from deephaven import time_table
from deephaven.table_listener import listen

def print_changes(label, update, is_replay):
    added = update.added()
    modified = update.modified()
    n_added = len(added["X"]) if "X" in added else 0
    n_modified = len(modified["X"]) if "X" in modified else 0
    changes = n_added + n_modified
    print(f"TICK PROPAGATION: {label} {changes} changes")

t1 = time_table("PT5s").update(["A=ii%2", "X=ii"])

# Group/ungroup
t2 = t1.group_by("A")
t3 = t2.ungroup()

# Partition/merge
t4 = t1.partition_by("A")
t5 = t4.merge()

h1 = listen(t1, lambda update, is_replay: print_changes("T1", update, is_replay))
h3 = listen(t3, lambda update, is_replay: print_changes("T3", update, is_replay))
h5 = listen(t5, lambda update, is_replay: print_changes("T5", update, is_replay))

I let this run for a while, and I see:

TICK PROPAGATION: T1 1 changes
TICK PROPAGATION: T3 26 changes
TICK PROPAGATION: T5 1 changes

See how the one tick has already been amplified 26x.

from deephaven import time_table
from deephaven.table_listener import listen

def print_changes(label, update, is_replay):
    added = update.added()
    modified = update.modified()
    n_added = len(added["X"]) if "X" in added else 0
    n_modified = len(modified["X"]) if "X" in modified else 0
    changes = n_added + n_modified
    print(f"TICK PROPAGATION: {label} {changes} changes")

t1 = time_table("PT5s").update(["A=ii%2", "X=ii"])

# Group/ungroup
t2 = t1 \
    .group_by("A") \
    .update("Y=X+1") \
    .ungroup()

# Partition/merge
t3 = t1.partition_by("A") \
    .proxy() \
    .update("Y=X+1") \
    .target \
    .merge()

h1 = listen(t1, lambda update, is_replay: print_changes("RAW            ", update, is_replay))
h2 = listen(t2, lambda update, is_replay: print_changes("GROUP/UNGROUP  ", update, is_replay))
h3 = listen(t3, lambda update, is_replay: print_changes("PARTITION/MERGE", update, is_replay))