axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.85k stars 402 forks source link

__io_uring_buf_ring_cq_advance broken? #1039

Closed SUPERCILEX closed 8 months ago

SUPERCILEX commented 8 months ago

https://github.com/axboe/liburing/blob/f31f86851ba655b9ddce9c0cd200668a4944b2cc/src/include/liburing.h#L1462-L1468

I'm confused, isn't this totally broken? If you context switch between the tail update and the cq atomic store, what prevents the stores to io_uring_buf from being reordered? As in what prevents the store to the tail being seen before the stores to update io_uring_buf, in which case the ring will contain garbage.

axboe commented 8 months ago

A context switch isn't going to matter, as that implies a full barrier. But if we have eg SQPOLL or maybe someone using io-wq offload picking a buffer, yes this looks problematic if the br->tail write is re-ordered with buffer additions. The tail write should be ordered with previous writes.

SUPERCILEX commented 8 months ago

A context switch isn't going to matter, as that implies a full barrier.

TIL, thanks.

The tail write should be ordered with previous writes.

Yeah, not sure if it's too late, but __io_uring_buf_ring_cq_advance should probably be removed.

axboe commented 8 months ago

Yeah, not sure if it's too late, but __io_uring_buf_ring_cq_advance should probably be removed.

Most cases will never hit it. An optimized network use case of this would always be local to the ring, and not use io-wq nor SQPOLL. This means the same thread will always produce and consume, and even on a weakly ordered machine, it will not cause an issue. On x86/x86-64, the write ordering is also fine.

That said, it can obviously cause an issue for some use cases, depending on the architecture. Which is why it certainly needs fixing. But let's not make this a bigger issue than it is, which in practice is not really problematic. And it's completely fixable without needing to yank the interface.

SUPERCILEX commented 8 months ago

But let's not make this a bigger issue than it is

Fair, maybe this just needs very careful docs?

axboe commented 8 months ago

I don't think doc fixing is enough, I tossed in the fix (referenced this issue) to ensure it's ordered instead. To be fair, even initially this was a pretty tiny optimization, and it's just a bit tinier now.

axboe commented 8 months ago

The problem is that making it easily understandable when to use what would be difficult, and most applications would not want to make it dependent on eg the architecture in question. I think that's going a bit too far. Hence I'd rather just make it generically fine, regardless of how it's used and on what.

SUPERCILEX commented 8 months ago

I see the fix, sounds good!

axboe commented 8 months ago

I see the fix, sounds good!

Thanks for the report!