bluesky / event-model

data model for event-based data collection and analysis
https://blueskyproject.io/event-model
BSD 3-Clause "New" or "Revised" License
13 stars 30 forks source link

Make RunRouter pass 'start' and 'descriptor' to callbacks #149

Closed danielballan closed 4 years ago

danielballan commented 4 years ago

When constructing a callback in a factory function for the RunRouter, such as

from event_model import DocumentRouter, RunRouter

class SomeCallback(DocumentRouter):
    def start(name, doc):
        self._start_doc = doc

    def event(self, doc):
        # Do something with self._start_doc.

def factory(name, doc):
    cb(name, doc)
    cb = SomeCallback()
    return [cb], []

rr = RunRouter([factory])

it is easy to forget cb(name, doc), i.e.

def factory(name, doc):
    cb = SomeCallback()
    return [cb], []

which typically leads to a secondary error (AttributeError: 'SomeCallback' object has not attribute '_start_doc' in this example) that does not make it at all clear that the mistake lies in factory.

The same sort of problem can occur for descriptor in RunRouter sub-factories.

I have made this mistake myself many times. Should we try to provide better errors? We could flip a flag when the start doc goes through and keep a cache of descriptor uids we have seen to provide better errors when we are missing a start or descriptor doc. It's not obvious to me where would be the best place to do it. Options:

tacaswell commented 4 years ago

I remember advocating for this behavior (that it is the factories job to feed the start document in), but not exactly why. Having seen this used in anger a bit, I now think that was a mistake, we should move the responsibility of pushing the start document through the callback out to the RunRouter and require the factories to return un-primed callbacks. We should do this now as I think the pain of getting this wrong all the time forever is going outweigh the sort-time pain of breaking stuff. I am also not sure that we can get the nice error messages good enough at a reasonable level of complexity.

We should make a similar change to the descriptor sub-factories, they should return a callback ready to have a start document and a descriptor pushed through.

I think the mistake we made was not clearly separating the "pick what to do" logic from the "do the thing logic". We made the RunRouter responsible for the "do the thing" in all cases but the first document andmade the factories responsible for both picking what to do and doing it (for the first and maybe second document). The argument "the factories have the documents, it should be their problem" is nice from minimizing the number of places in the code we touch the object, but we have discovered that we are better off minimizing the number of things the humans have to do when writing each piece of code.

We should put pushing the documents through in a try...except block that warns it will actually blow up in the future to give people a chance to update their factories (which I hope we don't have too many of in the wild yet). And very aggressively pin the minimum version of event-model on things.

jklynch commented 4 years ago

I endorse un-primed callbacks 2020!

I often forget to send the start document and I would rather the framework bend to fit me.