Open ashtonian opened 10 months ago
Thanks for the PR. I have scanned through it and it looks impressive. However reviewing this is quite a bit of work; it's so easy to introduce errors with atomic
, and any bugs are very difficult to find (they tend to pop up randomly). Note that I'm not saying that there are bugs in your code just that it takes time to review because it's, necessarily, a bit convoluted.
My current thought is that we leave this until after the next release (lets get a version of the library which supports persistence out there and then look at improving performance). I guess the other option would be to add this as an experimental feature (keeping it separate from the other memory
implementation - not that I'm saying my code is bug free!). While performance improvements are always nice it's difficult to know how much to invest in them (I suspect that this will make no discernible difference to 95% of users).
Reduce the complexity while keeping the no lock performance gains by only returning an open channel once (to signal initialization), and then after the first Enqueue() is called, Wait() would return an closed channel even if the queue is empty. I think if documented properly this would be potentially the best compromise.
If I'm reading this right it would result in the user effectively calling Dequeue()
constantly (because <- Wait()
would not actually wait). For very high throughput systems this may be an effective strategy but that would need testing (for most of my use-cases it would be a negative due hammering the CPU and chewing through batteries!).
One immediate improvement would be to create a closed channel at startup and return that from Wait
(as opposed to creating/closing a new channel each time).
keep a mutex with a slice of channels to notify. In this scenario when Wait() isn't called the notify overhead would be substantially less as there would be no channels to loop through. Still may need locking.
We could state that Wait()
must not be called concurrently (concurrent calls should not be needed as there should never be multiple goroutines reading from the queue because the message order is important).
Use atomic counters to keep track of queue depth, I think this would be a bit racey on empty or near empty queues. Could also avoid some of this if returning EmptyErr sometimes when the queue is near starving is acceptable.
Wouldn't head
be nil
when the queue is empty?
I'm happy for the Queue
interface to change (it's all internal and not really fixed until v1); the design was really just something that works with autopaho
and was relatively easy to follow (I'm sure it can be improved; this was a large change so there is probably a lot of suboptimal code!).
Sorry - one further question. Is the initial implementation your work or did this come from somewhere else (quick google led to this). If it's come from somewhere else then we need to acknowledge that (eclipse is, rightfully, cautions when it comes to checking licenses etc).
Thanks for the feedback!
My current thought is that we leave this until after the next release (lets get a version of the library which supports persistence out there and then look at improving performance). I guess the other option would be to add this as an experimental feature (keeping it separate from the other
memory
implementation - not that I'm saying my code is bug free!). While performance improvements are always nice it's difficult to know how much to invest in them (I suspect that this will make no discernible difference to 95% of users).
I agree and would be fine with both.
If I'm reading this right it would result in the user effectively calling
Dequeue()
constantly (because<- Wait()
would not actually wait). For very high throughput systems this may be an effective strategy but that would need testing (for most of my use-cases it would be a negative due hammering the CPU and chewing through batteries!). One immediate improvement would be to create a closed channel at startup and return that fromWait
(as opposed to creating/closing a new channel each time).
Correct, after thinking on this more I think it would be ideal if Wait() just always returns a closed channel and this is just a 'feature' of the lock free queue thats documented for the user. If batteries are involved this is definitely not the intended use case. This is for something like a server side ingest service that consumes all messages from an IoT fleet.
Sorry - one further question. Is the initial implementation your work or did this come from somewhere else (quick google led to this). If it's come from somewhere else then we need to acknowledge that (eclipse is, rightfully, cautions when it comes to checking licenses etc).
I did not invent the lock free queue, I was introduced to the concept after some googling when benchmarking from this paper. Using that as a reference point I looked for go implementations of which there are several nearly identical especially without generics, I then used the code you found as a reference point as it is the cleanest implementation of these from the paper. The PR code is different as you can see.
@ashtonian just out of interest did you close this because you found an issue, got sick of waiting for me, or something else? Taking a look had been on my list for some time and I was considering making this a third option for those with performance issues (but wanted to do some more testing and look into the licensing, at minimum we would need to credit the implementation you based this on).
I was poking around the repo debugging another issue and figured this wasn't going to get merged. I should have communicated before closing. I would love it if this gets merged. I'm not sure how the citation would work, but I think it would go to that paper. I'm also not necessarily convinced the modifications made to match the interface are adequate, probably needs some more testing.
I don't mind waiting. I also appreciate the work you've been putting into the driver, it seems to be coming together. Let me know if I can help with this. Sorry again, should have communicated before closing.
No worries; it's been idle a while (I'm busy on other projects currently).
I'm not sure how the citation would work, but I think it would go to that paper.
We need to comply with the eclipse contributor agreement which, in this case, requires you to agree that:
The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or
As much of this was based on MIT Licensed code I think that including that license (or a link to it) in the code should be sufficient. The key thing is that we follow the license requirements and acknowledge authorship.
The eclipse foundation is pretty tough on this; with good reason. If our code becomes contaminated with anything that's not appropriately licensed then that opens up our users to legal action.
I'd suggest adding the license (and a link to the original repo) in the header of queue.go. With that done I think this could be accepted as a second option for the queue (for use where performance is an issue). It may be that, in the future after significant testing, that we make it the default implementation.
I attempted to add a lock free queue for faster operations with some success. I believe the
Wait()
function inherently introduce a lock.The intended use case here is in a high volume concurrent environment where there is high churn on/off the queue, the major assumption being the queue won't be starved or empty. If it is empty the assumption being it won't be for long. In this scenario it would be more desirable to spin poll on peak/dequeue as the goal is to move as fast as possible even at the cost of some cpu cycles in the case of empty queues.
However in order to be more compliant with the interface as it stands my initial pr has a bit of a lock mechanism which drastically reduces the benefits of this approach.
I think it can be potentially improved by isolating the wait channel a bit differently.
Few Ideas:
Enqueue()
is called,Wait()
would return an closed channel even if the queue is empty. I think if documented properly this would be potentially the best compromise.Wait()
isn't called the notify overhead would be substantially less as there would be no channels to loop through. Still may need locking.EmptyErr
sometimes when the queue is near starving is acceptable.Here is the original simpler LockFreeQueue queue I used to replace an in memory channel based queue. In our use case (as described) the results were pretty noticeable, and we were not able to maintain low depth in our ingest service without it.