llmp memory order error?

AR-Wang commented 3 days ago

I see that we use the atomic variable current_msg_id to notify if there is a new message, but we use the release memory order to update the current_msg_id when llmp_client sends a message, but when broker reads the data, Load current_msg_id using releaxed memory sequence. This ensures that the latest current_msg_id value is read.

The send message function is unsafe fn send(&mut self, msg: mut LlmpMsg, overwrite_client_id: bool) -> Result<(), Error>; The received message function is unsafe fn recv(&mut self) -> Result<Option<mut LlmpMsg>, Error>.

I don't understand why we don't use Acquire memory order to get latest current_msg_id .

Hope to get your answer.thank so much.

tokatoka commented 3 days ago

honestly i don't think the memory sequence is even effective or needed. that was added in a failed attempt to fix a bug (but it didn't work and the bug came from other code)

domenukk commented 3 days ago

We are using relaxed ordering here because we don't care if we get messages a bit later, at least for the fuzzing use case. At least that was my thinking. Worst case we get a value that is too low, and read one message less this round, but we trade it for better performance. Does that sound like a good idea?

AR-Wang commented 3 days ago

I have analyzed the flow of libafl parallel fuzz, where each fuzzer runs in a separate process, the broker blocks in the main process and polls all the messages sent by the clients, where each fuzzer submits messages in the following way

unsafe fn send(&mut self, msg: *mut LlmpMsg, overwrite_client_id: bool) -> Result<(), Error> {
        // log::info!("Sending msg {:?}", msg);

        assert!(self.last_msg_sent != msg, "Message sent twice!");
        assert!(
            (*msg).tag != LLMP_TAG_UNSET,
            "No tag set on message with id {:?}",
            (*msg).message_id
        );
        // A client gets the sender id assigned to by the broker during the initial handshake.
        if overwrite_client_id {
            (*msg).sender = self.id;
        }
        let page = self.out_shmems.last_mut().unwrap().page_mut();
        if msg.is_null() || !llmp_msg_in_page(page, msg) {
            return Err(Error::unknown(format!(
                "Llmp Message {msg:?} is null or not in current page"
            )));
        }

        (*msg).message_id.0 = (*page).current_msg_id.load(Ordering::Relaxed) + 1;

        // Make sure all things have been written to the page, and commit the message to the page
        //todo here The current_msg_id is updated here to ensure that the message is visible to the broker
        (*page)
            .current_msg_id
            .store((*msg).message_id.0, Ordering::Release);

        self.last_msg_sent = msg;
        self.has_unsent_message = false;
        Ok(())
    }

The broker then calls the following method to continuously query the messages sent by the client.

#[inline(never)]
    unsafe fn recv(&mut self) -> Result<Option<*mut LlmpMsg>, Error> {
        /* DBG("recv %p %p\n", page, last_msg); */
        let page = self.current_recv_shmem.page_mut();
        let last_msg = self.last_msg_recvd;

        let (current_msg_id, loaded) =
            if !last_msg.is_null() && self.highest_msg_id > (*last_msg).message_id {
                // read the msg_id from cache
                (self.highest_msg_id, false)
            } else {
                // read the msg_id from shared map
                //todo Here, the broker first determines whether the client is sending a new message by reading the value of current_msg_id
                // The problem here is that the llmp_client and broker processes that send messages are two different processes
                // Without the Release-Acquire memory order, there is no guarantee that the broker process will be able to read the llmp_client process's updated values in time
                let current_msg_id = (*page).current_msg_id.load(Ordering::Relaxed);
                self.highest_msg_id = MessageId(current_msg_id);
                (MessageId(current_msg_id), true)
            };
        ///  
        // Read the message from the page
        // todo It may be that clint has updated current_msg_id, but the broker still sees current_msg_id as 0, which is a visibility problem caused by one write one read
         let ret = if current_msg_id.0 == 0 {
            /* No messages yet */
            None
        } else if last_msg.is_null() {
            /* We never read a message from this queue. Return first. */
            //todo like here you must use Acquire ,Ensure memory visibility
            fence(Ordering::Acquire);
            Some((*page).messages.as_mut_ptr())
    }

So if you do not select an appropriate memory sequence, there is no guarantee that messages sent by llmp_client will be received by broker. For example, llmp_client sends two messages and sets current_msg_id to 2. However, due to the Relaxed memory order used in the broker, the broker receives the message that it still sees current_msg_id as 0. The most appropriate approach here is to select the Release-Acquire memory order, ensuring that the llmp_client process's operations on memory are visible to the broker process.

Of course, you may have promised in other aspects, but I do not seem to see this code, and it may be an existing problem that I analyzed. I look forward to your answer

AR-Wang commented 3 days ago

We are using relaxed ordering here because we don't care if we get messages a bit later, at least for the fuzzing use case. At least that was my thinking. Worst case we get a value that is too low, and read one message less this round, but we trade it for better performance. Does that sound like a good idea?

This option is normal, of course, but I still think it's better to use the Release-Acquire mode as much as possible to ensure that the message is immediately visible, after all, if the user chooses to process the message later, he can do so by configuring the broker polling interval, which is the user's choice. But I think it's really bad for the code to let the cpu make this kind of decision, because you can't control how late you see the message. Of course I respect your decision and thank you very much for your reply

domenukk commented 1 day ago

We discussed this, and for this specific use-case we don't think slightly delayed messages are an issue. We will always lean towards higher performance. But thank you for the report, in general you are correct!

AFLplusplus / LibAFL

llmp memory order error? #2346