alex-petrenko / faster-fifo

Faster alternative to Python's multiprocessing.Queue (IPC FIFO queue)
MIT License
179 stars 29 forks source link

get latest item optimization #33

Open beasteers opened 2 years ago

beasteers commented 2 years ago

I just wanted to share this recipe for how you can setup the queue to only unpickle the latest item and to skip all other items. This is useful if you have processes running at different rates and you need one process to be able to efficiently drop messages that aren't the latest message, especially applicable for real-time applications.

This can be done quite simply with a wrapper class as shown, but maybe ppl might find it as a nice general feature? wrapping it into the core would get around having to do the slightly hacky thing with self.actual_loads but it's no big deal either way:

import faster_fifo
import faster_fifo_reduction
import queue

class Queue(faster_fifo.Queue):
    def __init__(self, *a, **kw):
        super().__init__(*a, **kw)
        # have the loads function be a noop
        self.actual_loads = self.loads
        self.loads = lambda x: x

    def get_many(self, *a, raw=False, **kw):
        xs = super().get_many(*a, **kw)
        return xs if raw else [self.actual_loads(x) for x in xs]

    def get_latest(self, block=True, **kw):
        '''Get the latest value, don't waste time unpickling values 
        you're not going to use.
        '''
        xs = self.get_many(block=block, raw=True, **kw)
        return self.actual_loads(xs[-1]) if xs else None

    def get_latest_nowait(self):
        return self.get_latest(block=False)

if __name__ == '__main__':
    # have the deserializer print out so we know what's actually being unpickled
    og_loads = Queue.loads
    def loads(x):
        x = og_loads(q, x)
        print('loaded', x)
        return x

    # setup the queue
    q = Queue(loads=loads)
    for i in range(6):
        print('put', i)
        q.put(i)

    # get the latest value
    print(q.get_latest())
    # no more values, this throws an error
    try:
        print(q.get_latest_nowait())
    except queue.Empty:
        print('no more messages, as expected.')

    # try it again for good measure
    for i in range(6):
        print('put', i)
        q.put(i)
    print(q.get_latest_nowait())

An addition that could also maybe(?) be useful would be to get the N latest values?

alex-petrenko commented 2 years ago

Wow, that's pretty cool! :D Awesome idea @beasteers I didn't expect that a simple overloading of loads would enable functionality like that!

I'm pretty sure you can save some more performance by adding this to library core, although probably not much. And to implement this elegantly we'd need some amount of refactoring.

How about a compromise for now, would you like to write it up as a README section in a PR? And if more people find this useful we can work on adding this as a core feature.

BTW, if I had a use case like this, I would consider using a single shared object protected with multiprocessing.Lock() Then you don't need a queue at all, and you'd never run into buffer overflow (Queue.Full) issues if you're lagging behind too many objects. If you use shared memory of some sort you can even save some more time on pickling this way (because you need neither pickle nor unpickle). This would not work for last_N of course.

Also, an obvious thought, if you need only the latest object but want to keep previous objects, sounds like you need a faster-lifo not faster-fifo :D That is, a multiprocessing Stack.