Scratch tables in "getting started" guide

jasonm commented 12 years ago

The "getting started guide" https://github.com/bloom-lang/bud/blob/master/docs/getstarted.md illustrates scratch tables with an example, but I see unexpected behavior where the scratch table appears to last for one extra tick, and need an additional /tick to produce blank output.

[~] ruby -v
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.3.0]

[~] gem list bud

*** LOCAL GEMS ***

bud (0.9.2)

[~] rebl
Listening on localhost:49218
Welcome to rebl, the interactive Bloom terminal.

Type: /h for help
      /q to quit

rebl> scratch :passing_clouds
rebl> passing_clouds <= [[3, "Nimbus"], [2, "Cumulonimbus"]]
rebl> stdio <~ passing_clouds.inspected
rebl> /tick
[3, "Nimbus"]
[2, "Cumulonimbus"]
rebl> /lsrules
1: passing_clouds <= [[3, "Nimbus"], [2, "Cumulonimbus"]]
2: stdio <~ passing_clouds.inspected
rebl> /rmrule 1
rebl> /tick
[3, "Nimbus"]
[2, "Cumulonimbus"]
rebl> /tick
rebl> /q

Rebellion quashed.

neilconway commented 12 years ago

Jason -- thanks for the report!

This change in behavior seems undesirable. From a quick look, I'd think it arises because 0.9+ changed the bootstrap/first tick behavior: we no longer tick() collections during the first timestamp. @sriram-srinivasan, I can't recall what the motivation for changing this behavior was. Comments?

sriram-srinivasan commented 12 years ago

IIRC, it was to (a) make collection.tick be biased towards incremental processing (assuming the tables have been put into the correct state at bootstrap), and (b) to retain backward compatibility with the tests.

For example, tc_collections.BabyBud loads scratch collections at bootstrap, and if we call tick() right away, it would clear the scratch in preparation for the fixpoint iteration ahead.

It seems to me that this issue is more to do with rebl; if a rule is deleted midstream, and the new instance is expected to boostrap and start from a clean slate, then why migrate the state from the old instance into the new one?

sriram-srinivasan commented 12 years ago

Continuing my earlier comment, it seems ok to tick the collections (after migration) inside rebl. Do you see any problem with that, Neil?

neilconway commented 12 years ago

tc_collections.BabyBud loads scratch collections at bootstrap, and if we call tick() right away, it would clear the scratch in preparation for the fixpoint iteration ahead.

IIRC the previous behavior was to call bootstrap early in the tick() -- i.e., on the first tick, we first clear scratches and promote pending into storage, then call bootstrap, then take the fixpoint.

Doing an extra tick after migration seems a little unfortunate -- that might have other unintended consequences (e.g., increment budtime, install other deferred-insert tuples, emit outgoing messages on channels, etc.).

sriram-srinivasan commented 12 years ago

IIRC the previous behavior was to call bootstrap early in the tick() -- i.e., on the first tick, we first clear scratches and promote pending into storage, then call bootstrap, then take the fixpoint.

I'll briefly describe the changes from 0.8 and 0.9 that were brought about for performance reasons. Earlier, we did semi-naive evaluation within a single tick, but we weren't being incremental across ticks. For example, a <= b * c would recalculate the join of b.storage and c.storage every single time. Clearly, this is unnecessary; if a is a scratch, it can cache the result once, and as long as b or c isn't negated, we need to only process deltas.

Doing an extra tick after migration seems a little unfortunate -- that might have other unintended consequences (e.g., increment budtime, install other deferred-insert tuples, emit outgoing messages on channels, etc.).

I agree about the 'install other deferred-insert tuples' bit. The others aren't really an issue; incrementing budtime seems on par with changing versions, and after deleting a rule, the system has certainly changed. The effect of that transition doesn't have to be unnoticeable. Outgoing messages on channels aren't an issue because there shouldn't be anything pending to be sent between ticks.

I think what we need is migration semantics to enable hot-swapping, for rule and schema evolution. In an incremental system, a tick expects a degree of continuity from the earlier tick, but migration is a different level of continuity. At the least, I expect budtime to continue monotonically, bootstrap to be re-executed, schemas to be monotonically upgradeable (you can only add columns, never remove), and all scratches must be cleared before handover. This shouldn't be dependent on the way tick is written.

sriram-srinivasan commented 12 years ago

There is a simple enough fix for the current implementation, without getting into the details of migration: just invalidate cached state. Only scratches do anything about it.

In rebl.rb::reinstantiate, after merging tables

  @rebl_class_inst.tables.each do |k,v|
    v.invalidate_cache
  end

neilconway commented 12 years ago

Ah, cool -- seems like a reasonable fix to me.

bloom-lang / bud

Scratch tables in "getting started" guide #274