eclipse-iceoryx / iceoryx2

Eclipse iceoryx2™ - true zero-copy inter-process-communication in pure Rust
https://iceoryx.io
Apache License 2.0
444 stars 22 forks source link

Large FixedSizeVec overflows stack #220

Closed jasonashton closed 2 weeks ago

jasonashton commented 1 month ago

Required information

Operating system:

Rust version: rustc 1.78.0 (9b00956e5 2024-04-29)

Cargo version: cargo 1.78.0 (54d8815d0 2024-03-26)

iceoryx2 version: main

Observed result or behaviour: When creating a large (1mb or greater) FixedSizeVec, the stack overflows

Expected result or behaviour: This is more of a question, but how can we send larger data over the IPC? I'm looking to transfer images between processes which needs much larger data sizes, but that overflows the stack. The answer is usually to use a Vec on the heap but that doesn't seem to be an option for IPC.

Conditions where it occurred / Performed steps: In iceoryx2-bb/container/tests/vec_tests.rs set

// const SUT_CAPACITY: usize = 128;
const SUT_CAPACITY: usize = 6220800;
type Sut = FixedSizeVec<usize, SUT_CAPACITY>;

and run cargo test --test vec_tests:

thread 'fixed_size_vec_capacity_is_correct' has overflowed its stack
fatal runtime error: stack overflow
error: test failed, to rerun pass `-p iceoryx2-bb-container --test vec_tests`
elfenpiff commented 1 month ago

@jasonashton Thanks a lot for pointing out this issue. I had it once on the radar but lost track of it.

What we in the end need is something like a real placement new, but I think the RFC to it was dismissed in 2018. So, we have to define something that works with our shared memory use case.

In the meantime, you can use the slice API where you shouldn't encounter this problem, see the example here: https://github.com/eclipse-iceoryx/iceoryx2/tree/main/examples/rust/publish_subscribe_dynamic_data

elfenpiff commented 1 month ago

@elBoberido mentioned to me that this approach should work in release mode:

let service = zero_copy::Service::new(&service_name)
    .publish_subscribe()
    .open_or_create::<FixedSizeVec<u64, 100000000>>()?;
let mut sample = publisher.loan_uninit()?;
let sample = unsafe {
    sample
        .payload_mut()
        .as_mut_ptr()
        .write(FixedSizeVec::default());

    sample.assume_init()
};

It is restricted to release mode. Here the compiler optimizes the copy away and implicitly writes directly into the shared memory.

jasonashton commented 1 month ago

@elfenpiff thanks for the quick response! I tried the second example, it works for me as a bin but not a test (even with --release, any ideas why?

This works for me, it sends and receives

use core::time::Duration;
use iceoryx2::prelude::*;
use iceoryx2_bb_container::vec::FixedSizeVec;

const CYCLE_TIME: Duration = Duration::from_secs(1);
const COMMON_IMAGE_SIZE: usize = 3 * 1080 * 1920;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let service_name = ServiceName::new("my/big/topic4")?;

    let service = zero_copy::Service::new(&service_name)
        .publish_subscribe::<FixedSizeVec<u8, COMMON_IMAGE_SIZE>,>()
        .open_or_create()?;

    let publisher = service.publisher().create()?;

    let mut counter: u64 = 0;

    while let Iox2Event::Tick = Iox2::wait(CYCLE_TIME) {
        counter += 1;
        let mut sample = publisher.loan_uninit()?;
        let mut img = FixedSizeVec::<u8, COMMON_IMAGE_SIZE>::default();
        for i in 0..img.capacity() {
            img.push((i % 255) as u8);
        }
        let sample = unsafe {
            sample
                .payload_mut()
                .as_mut_ptr()
                .write(img);
            sample.assume_init()
        };
        sample.send()?;
        println!("Sent sample {} ...", counter);
    }
    println!("exit");
    Ok(())
}

But this simple test fails with a stack overflow:

    #[test]
    fn test_big_fixed_size_vec() {
        const COMMON_IMAGE_SIZE: usize = 3 * 1080 * 1920;
        let mut img = FixedSizeVec::<u8, COMMON_IMAGE_SIZE>::default();
        for i in 0..img.capacity() {
            img.push((i % 255) as u8);
        }
        assert_eq!(img.len(), img.capacity());
    }

On the topic of the dynamic pub/sub example, I don't fully understand how it works, it seems like it is essentially just breaking up the larger data into multiple publishes? So based on my understanding I'd have to reassemble the data on the other end, is that correct?

Lastly, I believe we used iceoryx zero-copy RMW for ROS2 (RouDI) (though on linux, not mac) at a previous company I was at, and we were sending multiple large images in a single message through zero-copy, and I had never run into an issue like this. Is there some difference here? Is it a iceoryx2, or mac, or rust specific issue?

Thanks for the help, and for the great project!

jasonashton commented 1 month ago

Just another data point, I tried the dynamic publisher but just set max slice length to the size of the vec and that seems to work for me:

use core::time::Duration;
use iceoryx2::prelude::*;

const CYCLE_TIME: Duration = Duration::from_secs(1);

const COMMON_IMAGE_SIZE : usize = 3 * 1920 * 1080;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let service_name = ServiceName::new("/big/dynamic/service")?;

    let service = zero_copy::Service::new(&service_name)
        .publish_subscribe::<[u8]>()
        .open_or_create()?;

    let publisher = service
        .publisher()
        .max_slice_len(COMMON_IMAGE_SIZE)
        .create()?;

    let mut counter = 1;
    let mut vec = Vec::<u8>::with_capacity(COMMON_IMAGE_SIZE);
    for i in 0..vec.capacity() {
        vec.push((i % 255) as u8);
    }

    while let Iox2Event::Tick = Iox2::wait(CYCLE_TIME) {
        counter += 1;

        let sample = publisher.loan_slice_uninit(COMMON_IMAGE_SIZE)?;
        let sample = sample.write_from_fn(|byte_idx| vec[byte_idx]);
        sample.send()?;
        println!("Send sample {} ...", counter);
    }

    println!("exit");
    Ok(())
}

The dynamic subscriber example prints received 6220800 bytes

elfenpiff commented 1 month ago

@jasonashton

I tried the second example, it works for me as a bin but not a test (even with --release, any ideas why?

I think that cargo test --release is slightly different than the cargo run --release mode but I cannot name any specific differences.

On the topic of the dynamic pub/sub example, I don't fully understand how it works, it seems like it is essentially just breaking up the larger data into multiple publishes?

No, the idea is to send a slice, and you can define the worst-case size on the publisher's side. The example is that you want to send camera data, and at runtime you detect a 720p camera. Therefore, you know your image size is 4 1280 720 - so you create a publisher with this worst-case size. After some runtime, the camera is disconnected and replaced with a 4k camera, and your image size becomes 4 3840 2160. You remove the old publisher and create a new one with a larger worst-case size. If the old camera is connected again you can either stick to the current worst-case size or remove the new publisher again and create a new one with a smaller size.

I think I need to write a blog article to clarify this a bit more in detail.

In the future this shall all happen behind the scenes so that you do not have to recreate a publisher anymore. But for now it is already a big improvement compared to iceoryx1.

Lastly, I believe we used iceoryx zero-copy RMW for ROS2 (RouDI) (though on linux, not mac) at a previous company I was at, and we were sending multiple large images in a single message through zero-copy, and I had never run into an issue like this. Is there some difference here? Is it a iceoryx2, or mac, or rust specific issue?

It is a rust-specific issue. Take a look at this here: https://github.com/rust-lang/rust/issues/53827 This is exactly the problem behind the scenes since the FixedSizeVector<u8, 100000000> has an array member of [MaybeUninit<u8>, 100000000] that overflows the stack when you want to put it into the heap like

let heap_array = Box::new([0; 100000000]);

But we already started solving this. The idea is to introduce a PlacementNew trait

pub trait PlacementNew: Sized {
    type NewParam;

    fn placement_new(this: &mut MaybeUninit<Self>, param: &Self::NewParam);
}

in combination with a proc derive placement_new macro so that the user can use it like:

#[derive(placement_new)]
struct MyTransmissionType {
  data: FixedSizeVec<u64, 1234>,
  whatever: FixedSizeString<4567>,
}

Just another data point, I tried the dynamic publisher but just set max slice length to the size of the vec and that seems to work for me:

Perfect :)

In the future we want to integrate also RelocatableVec which you could use like this

struct CameraData {
  image: RelocatableVec<u8>,
  camera_details: FixedSizeString<128>,
  whatever: u64
}

// ...

let sample = publisher.loan()?;
sample.image.reserve(SIZE_OF_CAMERA_IMAGE);
// ...
sample.image.push(123);
elBoberido commented 1 month ago

@jasonashton the example with iceoryx uses shared memory and in release builds the compiler is able to optimize the copy from the stack to the shared memory away and writes the vector directly to the shared memory. In the test this is not possible since it will stay on the stack. You could circumvent this by using Box<FixedSizeVec<u8, COMMON_IMAGE_SIZE>> for the tests but there would still be the issue with the stack overflow in debug builds. Using ulimit -s unlimited would be a workaround but we need a proper solution. Ideally, Rust would provide something similar to the C++ placement new.

jasonashton commented 4 weeks ago

No, the idea is to send a slice, and you can define the worst-case size on the publisher's side. The example is that you want to send camera data, and at runtime you detect a 720p camera. Therefore, you know your image size is 4 1280 720 - so you create a publisher with this worst-case size. After some runtime, the camera is disconnected and replaced with a 4k camera, and your image size becomes 4 3840 2160. You remove the old publisher and create a new one with a larger worst-case size.

This is what I ended up implementing, I abstracted the publisher to automatically do this. The benefit of this is actually that I can publish non-fixed sizes like a serialized protobuf, which actually works slightly better for us, though I'm sure at the cost of some performance?

in combination with a proc derive placement_new macro so that the user can use it like:

This is close to what I was originally trying to implement (though with the FixedSizeVec) as I'm used to the requirement of compile time fixed sizes from Iceoryx / RouDI. The relocatable vec solution seems good too.

Thanks for the help! I'm currently unblocked now, but looking forward to trying out the fixes when they land.

elBoberido commented 3 weeks ago

Oh, you already used iceoryx with RouDi? Can you share some details about your project?

We do not have numbers for the impact of creating new publishers. In theory it should not be too expensive but maybe some optimizations are necessary.

jasonashton commented 3 weeks ago

It was at a previous company working on ROS2-based robots. We had 8 uncompressed images we needed to get from one node to another and zero copy was necessary as otherwise the software would quickly grind to a halt using DDS. I wasn't the engineer directly setting it up but I was working around it. What I have working now with Iceoryx2 in a week is farther and more flexible than what took months of work and troubleshooting previously.

elBoberido commented 3 weeks ago

Cool. Would be great to get something like this as success/user story on our website ... and with our website I mean ekxide.io :)

elfenpiff commented 2 weeks ago

@jasonashton We implemented a PlacementDefault trait and derive proc-macro to address the issue. It is implemented for all iceoryx2 containers, and a working example is here: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/examples/rust/complex_data_types/complex_data_types.rs

I also added a FAQ entry: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/FAQ.md#my-transmission-type-is-too-large-encounter-stack-overflow-on-initialization

For now, I would define this as the iceoryx2 idiomatic approach to initialize large structs in shared memory.

But as @elBoberido already found out, in release mode, the compiler optimizes this so that the stack overflow should not pose a problem.

If you have time to play around with it, I would be happy to receive some feedback.

elfenpiff commented 2 weeks ago

@jasonashton With PR #230 this issue should be solved. If I missed something or another related issue turns up please feel free to reopen the issue again.