IBM / sliding-window-aggregators

Reference implementations of sliding window aggregation algorithms
Apache License 2.0
43 stars 16 forks source link

Question about FiBA implementation: updating hitLeft information in rebalanceAfterEvict function #77

Closed takoyaki65 closed 1 year ago

takoyaki65 commented 1 year ago

Hello,

I was reading the paper "Optimal and General Out-of-Order Sliding-Window Aggregation" and took a look at the FiBA implementation code available on GitHub. While inspecting the code, I came across the function Node* rebalanceAfterEvict(Node* node, bool* hitLeft, bool* hitRight, Node* toRepair=NULL) on line 832 of FiBA.hpp.

https://github.com/IBM/sliding-window-aggregators/blob/f62d3cf88fbed83957f27d690060a7547936db4f/cpp/src/FiBA.hpp#L841

I noticed that on that line, only the hitRight information is being updated, while the hitLeft information is not being updated. This seems to be a problem, because when rebalancing a node that contains a tuple like (st, 20, t) (which is stored in the right child of the root node), the sibling node would become the left spine, and therefore its aggregated value would also need to be updated.

image

Could you please explain why only hitRight is being updated, and not hitLeft? Am I missing something here?

Thank you in advance for your help!

hirzel commented 1 year ago

Dear takoyaki65, thank you for reporting this issue! I think you are correct, and this code should also update hitLeft. We will try to address this in the next few weeks (it requires rerunning the regression tests). -Martin

takoyaki65 commented 1 year ago

Dear @hirzel, thank you for your response and for looking into this issue. I appreciate your help! However, to make sure that my understanding is correct, I would like to try testing and verifying the fix myself. Thank you again for your support! -Mizokami

takoyaki65 commented 1 year ago

I added the *hitLeft |= sibling->leftSpine(); line and tested the code again, and all the tests passed (although this is expected, as the only change was to repair the aggregation value more often). I will continue to investigate if I can find any test cases where the previous code would produce incorrect aggregation results.

tk65:~/documents/sliding-window-aggregators/cpp$ ./build-and-test.sh g++ --version > bin/log.txt 2>&1 g++ -std=gnu++17 -DHAS_UNCAUGHT_EXCEPTIONS=1 -O3 -DNDEBUG -static src/benchmark_driver.cc -o bin/benchmark_driver g++ -std=gnu++17 -DHAS_UNCAUGHT_EXCEPTIONS=1 -O3 -DNDEBUG -static -DCOLLECT_STATS src/benchmark_driver.cc -o bin/benchmark_driver_stats g++ -std=gnu++17 -DHAS_UNCAUGHT_EXCEPTIONS=1 -O3 -DNDEBUG -static src/ooo_benchmark_driver.cc -o bin/ooo_benchmark_driver g++ -std=gnu++17 -DHAS_UNCAUGHT_EXCEPTIONS=1 -O3 -DNDEBUG -static src/data_benchmark.cc -o bin/data_benchmark g++ -std=gnu++17 -DHAS_UNCAUGHT_EXCEPTIONS=1 -ggdb src/test.cc -o bin/test g++ -std=gnu++17 -DHAS_UNCAUGHT_EXCEPTIONS=1 -O3 -DNDEBUG -static src/dynamic_benchmark_driver.cc -o bin/dynamic_benchmark_driver g++ -std=gnu++17 -DHAS_UNCAUGHT_EXCEPTIONS=1 -O3 -DNDEBUG -static src/bulk_evict_benchmark_driver.cc -o bin/bulk_evict_benchmark_driver g++ -std=gnu++17 -DHAS_UNCAUGHT_EXCEPTIONS=1 -O3 -DNDEBUG -static src/bulk_evict_insert_benchmark_driver.cc -o bin/bulk_evict_insert_benchmark_driver cd bin; zip -D benchmark_bin.zip * updating: benchmark_driver (deflated 67%) updating: benchmark_driver_stats (deflated 67%) updating: bulk_evict_benchmark_driver (deflated 65%) updating: bulk_evict_insert_benchmark_driver (deflated 64%) updating: data_benchmark (deflated 65%) updating: dynamic_benchmark_driver (deflated 65%) updating: log.txt (deflated 20%) updating: ooo_benchmark_driver (deflated 65%) updating: test (deflated 78%) sum passed mean passed stddev passed collect passed sawtooth:mincount passed sawtooth:collect passed thirds:mincount passed thirds:collect passed mincount passed argmax passed max passed timestamped-FIFO mincount passed timestamped-FIFO sum passed range query sum passed non-FIFO collect wrt. naive walk passed non-FIFO bloom wrt. naive walk passed non-FIFO with set passed non-FIFO sum with map passed

hirzel commented 1 year ago

Thanks for testing this! Would you like to do a PR for it?

takoyaki65 commented 1 year ago

I apologize for your inconvenience. I am currently preoccupied with personal matters, such as preparing for my graduate school entrance interviews, seminar presentations, and final exams, so it seems unlikely that I can immediately fulfill your request. However, I am planning to create a counterexample case and include it in a pull request that I aim to send by mid-July.

hirzel commented 1 year ago

Sounds good! No rush.

takoyaki65 commented 1 year ago

Hello, I have some free time now and I'll start working on the pull request.

takoyaki65 commented 1 year ago

I tested btree::Aggregate with Sum<int> aggregation function(from AggregationFunctions.hpp) and min_arity=2(minumum child node of non-leaf node is 2).

typedef uint64_t timestamp;
const int min_arity = 2;

int main() {
  auto fiba_btree =
      btree::MakeAggregate<Sum<int>, timestamp, min_arity, btree::finger>()(0);

https://github.com/takoyaki65/sliding-window-aggregators/blob/aa75b37eb04ef9f381dc58e965d23e44539fc8d5/cpp/src/temporary.cpp#L10C1-L16C1

I tested it by simple example. First, I insert (timestamp, value) = (1, 1), (2, 2), (3, 3), (4, 4) records into btree::Aggregate instance.

  const int end = 4;

  // insert (timestamp, value) = (1, 1), (2, 2), ..., (end, end) into fiba_btree
  for (int i = 1; i <= end; ++i) {
    std::cout << "insert (" << i << ", " << i << ")...." << std::endl;
    fiba_btree.insert(i, i);
    std::cout << fiba_btree << std::endl;
  }

https://github.com/takoyaki65/sliding-window-aggregators/blob/aa75b37eb04ef9f381dc58e965d23e44539fc8d5/cpp/src/temporary.cpp#L17C1-L25C1

After that, I evict most recent record in btree::Aggregate one by one, and check whether aggregation value is calculated correctly.

  // aggregation_result: 1 + 2 + ... + end
  int aggregation_result = (end + 1) * end / 2;
  for (int i = end; i >= 1; --i) {
    // check if the aggregation result is correct
    int result = fiba_btree.query();
    if (result != aggregation_result) {
      std::cout << "aggregation result is wrong: " << aggregation_result
                << "(expected) " << result << "(actual)." << std::endl;
    } else {
      std::cout << "[";
      for (int j = 1; j <= i; ++j)
        std::cout << j << (j == i ? "" : ", ");
      std::cout << "] is aggregated correctly" << std::endl;
    }

    std::cout << "evict " << i << "/" << i << "... " << std::endl;
    fiba_btree.evict(i);
    std::cout << fiba_btree << std::endl;
    aggregation_result -= i;
  }

https://github.com/takoyaki65/sliding-window-aggregators/blob/aa75b37eb04ef9f381dc58e965d23e44539fc8d5/cpp/src/temporary.cpp#L26C1-L45C4

I executed this test actually. and The result is bellow:

insert (1, 1)....
(1/1)/ 1 root leaf

insert (2, 2)....
(1/1 2/2)/ 3 root leaf

insert (3, 3)....
(1/1 2/2 3/3)/ 6 root leaf

insert (4, 4)....
(
  (1/1 2/2)/ 3 leaf left-spine
 3/3
  (4/4)/ 4 leaf right-spine
)/ 3 root

[1, 2, 3, 4] is aggregated correctly
evict 4/4... 
(
  (1/1)/ 3 leaf left-spine
 2/2
  (3/3)/ 3 leaf right-spine
)/ 2 root
FiBA.hpp:1693: FAILED(partial aggregation value is not correct)
(1/1)/ 3 leaf left-spine
terminate called after throwing an instance of 'int'
Aborted

After each modification operation, tree structure is printed. (...)/3 mean one node with aggregation value 3, 1/2 mean a record with timestamp 1 and value 2,

We can see that after evicting a (4, 4) record, the rightmost leaf node becomes underflow and rebalance is occurred. But partial aggregation result of leftmost leaf node (that is leftspine) is not fixed.

IMG_8666

This error is caused because btree::aggregate does not repair borrowed siblings aggregation value. So I add 1 line inNode rebalanceAfterEvict(Node node, bool hitLeft, bool hitRight, Node* toRepair=NULL)` (on line 832 of FiBA.hpp)

https://github.com/takoyaki65/sliding-window-aggregators/blob/aa75b37eb04ef9f381dc58e965d23e44539fc8d5/cpp/src/FiBA.hpp#L842

After that, I tried this test one more and confirmed that `btree::aggregate`` passed this test correctly.

tk65@DESKTOP-A32HI1Q:~/documents/forked/sliding-window-aggregators/cpp/src$ ./a.out
insert (1, 1)....
(1/1)/ 1 root leaf

insert (2, 2)....
(1/1 2/2)/ 3 root leaf

insert (3, 3)....
(1/1 2/2 3/3)/ 6 root leaf

insert (4, 4)....
(
  (1/1 2/2)/ 3 leaf left-spine
 3/3
  (4/4)/ 4 leaf right-spine
)/ 3 root

[1, 2, 3, 4] is aggregated correctly
evict 4/4... 
(
  (1/1)/ 1 leaf left-spine
 2/2
  (3/3)/ 3 leaf right-spine
)/ 2 root
(
  (1/1)/ 1 leaf left-spine
 2/2
  (3/3)/ 3 leaf right-spine
)/ 2 root

[1, 2, 3] is aggregated correctly
evict 3/3... 
(1/1 2/2)/ 3 root leaf
(1/1 2/2)/ 3 root leaf

[1, 2] is aggregated correctly
evict 2/2... 
(1/1)/ 1 root leaf
(1/1)/ 1 root leaf

[1] is aggregated correctly
evict 1/1... 
()/ 0 root leaf
()/ 0 root leaf

# of combine calls: 40
# of nodes repaired: 13
# of repair calls involving a spine: 3
# of repair calls: 8
avg root degree when repairs were made: 2.375
takoyaki65 commented 1 year ago

I don't know what I should add a test in test.cc, because out-of-order insertion/eviction test is implemented and checked by void test_non_fifo_set(uint64_t iterations, uint64_t window_size) in cpp/src/test.cc.

https://github.com/IBM/sliding-window-aggregators/blob/f62d3cf88fbed83957f27d690060a7547936db4f/cpp/src/test.cc#L442

So, I will only add *hitLeft |= sibling->leftSpine(); in Node* rebalanceAfterEvict() (in line 842 on cpp/src/FIBA.hpp) to my pull request.

takoyaki65 commented 1 year ago

I had initially committed directly to the master branch and attempted to create a pull request from there. However, I have now cancelled that pull request and instead made the commits on a separate branch derived from master, from which I will make the new pull request.