Bulk experiments are runnable

Two things to note:

The experiments will not complete because they encounter a runtime error with AMTA; the experiments do a check to make sure that the window size is what we think it is, and that check fails.
The bulk-evict-insert benchmarks take a surprising amount of time. They appear to be slower than the bulk-evict benchmarks. We'll need to investigate.

IBM / sliding-window-aggregators