Rust version of FlatFIT sometimes fails test

IBM / sliding-window-aggregators

Reference implementations of sliding window aggregation algorithms

Apache License 2.0

42 stars 16 forks source link

Rust version of FlatFIT sometimes fails test #40

Open scotts opened 3 years ago

scotts commented 3 years ago

Sometimes when running the tests, FlatFIT fails with:

thread 'test2::flatfit' panicked at 'attempt to add with overflow', /Users/scottschneider/Documents/gh/sliding-window-aggregators/rust/src/flatfit/mod.rs:134:27
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I suspect this only happens sometimes because I think the tests are generating random numbers. We should make that deterministic and diagnose what's happening.

segeljakt commented 3 years ago

Hi, I apologise for not contributing in a while. It seems like the generated sums are sometimes too large to be stored in a i64 which makes the UDF of the window-aggregator panic. I think there are some options to solving this problem:

Switch to wrapped addition (i64::wrapping_add) which is guaranteed to not crash.
Switch to checked addition (i64::checked_add) and propagate the error upwards in the test instead of panicking.
Generate sums which are guaranteed to not exceed the std::i64::MAX.

What do you think would work best?

scotts commented 3 years ago

Hi, Klas! I'm sorry I didn't put full info in this issue, I made it as a place holder for myself more than anything. Since I've modified the code that this test is running, the line numbers don't add up with what's in the main repo on the master branch. This is the code on my repo on my working branch:

https://github.com/scotts/sliding-window-aggregators/blob/3b756f77baa340fa9f4b04a1ce486c4498f88292/rust/src/flatfit/mod.rs#L123-L146

Note that line 134, where the overflow panic happens, deals with calculating the offsets. I don't think the actual operator is involved. Your explanation - that the panic is caused by generating numbers too large to add together - makes a lot of sense, but I can't square it with the line that's reporting the panic.

Can you try reproducing locally on your own branch? I've noticed it happens on the order of 10% of the time I run the tests.

scotts commented 3 years ago

Also, I'm doing some work on the Rust side of the repo. I'm glad to see you're still interested! I'll make two new issues to discuss two things I'm working on.

segeljakt commented 3 years ago

Ok hmm, I'll take a look and see what might be wrong. I'm always interested in windows, as long as it't not Microsoft Windows 😂

segeljakt commented 3 years ago

I tried your branch but for some reason it seems to not crash. Even when I do:

while do; cargo test -- test2::flatfit; done

It might be something related to the unsafe code. If you try and replace:

let mut new_buffer: Vec<Item<Value>> = Vec::with_capacity(new_capacity);
unsafe {
    // Unsafe is used here so we don't need to initialize the buffer's contents
    new_buffer.set_len(new_capacity);
}

with

let mut new_buffer = vec![Item::new(Value::identity(), usize::default()); new_capacity];

does anything change? I should probably have made it safe from the start. Rust is better at optimising safe code as far as I understand, so the initialisation will likely be optimised away.

scotts commented 3 years ago

@segeljakt, fascinating. I will play around with that today or tomorrow and get back to you.