eclipse-iceoryx / iceoryx2

Eclipse iceoryx2™ - true zero-copy inter-process-communication in pure Rust
https://iceoryx.io
Apache License 2.0
497 stars 23 forks source link

`Publisher::loan_uninit` has unclear behavior #290

Closed appcypher closed 1 month ago

appcypher commented 1 month ago

(Code) Example Of Cumbersome API

Great project you have here. I'm new to iceoryx and shmem IPC in general but I'm trying to model a simple request-reply mechanism on top of the existing pub-sub communication. I understand it is part of the roadmap but just to have something working in the meantime, I made a simple solution that requires reusing the allocated memory for a sample.

So I tried reusing the same memory segment:

fn publisher() -> anyhow::Result<()> {
    println!("Publisher");

    let node = NodeBuilder::new().create::<zero_copy::Service>()?;
    let service = node
        .service_builder("test/path".try_into()?)
        .publish_subscribe::<u64>()
        .max_publishers(1)
        .max_subscribers(1)
        .open_or_create()?;

    let publisher = service.publisher_builder().max_loaned_samples(1).create()?;
    let sample = publisher.loan_uninit()?;
    let _ = sample.write_payload(1234);

    while let NodeEvent::Tick = node.wait(CYCLE_TIME) {
        let sample = publisher.loan_uninit()?;
        sample.send()?;

        println!("Sent existing");
    }

    Ok(())
}
fn subscriber() -> anyhow::Result<()> {
    println!("Subscriber");

    let node = NodeBuilder::new().create::<zero_copy::Service>()?;
    let service = node
        .service_builder("test/path".try_into()?)
        .publish_subscribe::<u64>()
        .max_publishers(1)
        .max_subscribers(1)
        .open_or_create()?;

    let subscriber = service.subscriber_builder().create()?;

    while let NodeEvent::Tick = node.wait(CYCLE_TIME) {
        while let Some(sample) = subscriber.receive()? {
            println!("Received: {:?}", sample.payload());
        }
    }

    Ok(())
}

I expected the subscriber process to output Recieved: 1234 repeatedly but instead got:

Received: 1234
Received: 0
Received: 1234
Received: 0
Received: 1234
Received: 0
Received: 1234
Received: 0
...

Full code: https://github.com/appcypher/iceoryx2-ipc-test/blob/main/bin/test_pub_loan.rs

Improvement Suggestion

Tbh, I'm not sure if this is the bug or an expected behavior as I haven't dived too deep into the code. But it would be nice to have a way to reuse the same loaned memory in a consistent way.

appcypher commented 1 month ago

I should add that I'm on commit 46f8b60. Running on macOS.

elfenpiff commented 1 month ago

This is an API bug on our side. We already discussed some solutions but did not find the time to implement them.

loan_uninit returns a SampleMut<MaybeUninit<T>> and this version should not provide a send method, only SampleMut<T> should, and this you can acquire with

let sample = publisher.loan_uninit()?;
let sample = sample.write_payload(1234); // SampleMut<MaybeUninit<T>> becomes SampleMut<T>

What happens behind the scenes here is that:

  1. You acquire memory with let sample = publisher.loan_uninit()?;
  2. The value 1234 is written into the memory position
  3. The SampleMut is going out of scope and the memory is released
  4. In the loop let sample = publisher.loan_uninit()?; acquires the same memory again where still the 1234 is written to
  5. sample.send()?; sends technically uninitialized memory to the subscriber (due to our API bug)
  6. In the next iteration you acquire another sample, this piece of memory was never used so you get the zeroed memory from the OS and send it
  7. In the meantime, the subscriber printed the sample content, the sample went out of scope on subscriber side and the memory with the content 1234 becomes available again.
  8. The publisher loans the memory with the 1234 content again and the game restarts.

But it would be nice to have a way to reuse the same loaned memory in a consistent way.

This is something we consider currently when designing request response, but this is not always possible. Just think of the use case where one client sends a request to two servers and receives two responses. But for a many client to one server scenario it could work.

But if you build request response on top of our current publish subscribe architecture you will not be able to reuse the received sample and write additional data into it. The reason is that every publisher comes with its own data segment and there are internal checks that should ensure that you do not send samples that do not belong to your data segment.

However, happy news, we are aiming to finish the request-response implementation in September/October!

@appcypher Could you tell us a bit more about your use case? Why do you want to reuse the sample/memory and do not want to acquire a new one?

elBoberido commented 1 month ago

@appcypher it is not possible to reuse the same memory. There is currently a bug on main which allows to send uninitialized samples. it should actually not be possible to have a loan_uninit followed by a send. The write_payload returns the actual sample to be published.

So your code releases the sample immediately after write_payload. The first iteration of the loop gets that release sample and since the memory is not touched, it holds the value you wrote before. The second iteration gets a new sample since the first one seems to be still in use. To be more precise, it will be in the subscriber queue. On the third iteration, the initial sample is again released and the loan_uninit gets that sample. The cycle repeats and that's how you get that output.

elBoberido commented 1 month ago

... oh, your code needs to be changed to this

    let publisher = service.publisher_builder().max_loaned_samples(1).create()?;

    while let NodeEvent::Tick = node.wait(CYCLE_TIME) {
        let sample = publisher.loan_uninit()?;
        let sample = sample.write_payload(1234);
        sample.send()?;
    }
appcypher commented 1 month ago

So your code releases the sample immediately after write_payload. The first iteration of the loop gets that release sample and since the memory is not touched, it holds the value you wrote before. The second iteration gets a new sample since the first one seems to be still in use. To be more precise, it will be in the subscriber queue. On the third iteration, the initial sample is again released and the loan_uninit gets that sample. The cycle repeats and that's how you get that output.

Thanks for the explanation.

@appcypher Could you tell us a bit more about your use case? Why do you want to reuse the sample/memory and do not want to acquire a new one?

I thought it would make certain patterns possible. In my case, in trying to implement a contrived request-response communication pattern, I landed on a queue-based solution (there's just one client and one server communicating). The client would write its chunked request to the same in-memory queue (pushing chunks continuously) and the receiver/server would pick it up from there -- sending back some ack. This requires that a publisher holds onto the same memory where it mutates the queue repeatedly. Maybe a kind of sample that can be resent without being owned.

I may be totally off here but I believe no real sending is taking place since the pub-sub processes have access to the same memory and it is zero-copy. So send is really just a signalling mechanism.

However, happy news, we are aiming to finish the request-response implementation in September/October!

Sweet. Can't wait since that is really what I need. I'm willing to build a sub-optimal hack that I can use now until then.

elBoberido commented 1 month ago

I thought it would make certain patterns possible. In my case, in trying to implement a contrived request-response communication pattern, I landed on a queue-based solution (there's just one client and one server communicating). The client would write its chunked request to the same in-memory queue (pushing chunks continuously) and the receiver/server would pick it up from there -- sending back some ack. This requires that a publisher holds onto the same memory where it mutates the queue repeatedly. Maybe a kind of sample that can be resent without being owned.

I may be totally off here but I believe no real sending is taking place since the pub-sub processes have access to the same memory and it is zero-copy. So send is really just a signalling mechanism.

Well, this is exactly like iceoryx works. The loan call takes memory shared between the processes and send adds a pointer to that memory to the queue for the consumer + some bookkeeping, memory synchronization, etc. So you have to loan memory, write data to that memory and then send it. There is also the event messaging pattern which can be combined with publish-subscribe to notify the consumer about new data in the queue, instead of having the consumer poll the queue.

appcypher commented 1 month ago

Thanks again @elBoberido!

How would you suggest one create a request-reponse model today using existing messaging patterns? I want to keep using iceoryx2 as is until the request-response feature lands.

appcypher commented 1 month ago

Thanks for helping the other day. After running some code today I have a better grasp of how iceoryx2 works. I think I know how I can leverage the internal queue system here. I will be closing this issue as the question was really about understanding how to make iceoryx2 work for my use case.

Although, as suggested, it probably makes sense to prevent send on SampleMut<Service, MaybeUninit<T>, ...>