Improve structure of event logging

hendrikmakait commented 2 weeks ago

There are two main things that I dislike about the current way we handle event logging:

Having a catch-all "all" topic defeats the purpose of a topic-based messaging system. We should think more about the topics we use to log events to. As a starting point, structuring this similar to "regular" logging might be a good idea, e.g., sending your usual scheduler events to scheduler or P2P events to p2p.
We allow anything that's msgpack-serializable to be a message. Implicitly, we have already aligned on the convention of using a dictionary with an action key that acts as an identifier of the specific message type. We should make this explicit. Right now, this free-form format just allows producers to dump bad/unstructured messages and force consumers to deal with the mess. Personally, I'd also be in favor of adding better type-hinting either via typed dicts, dataclasses, or similar.

hendrikmakait commented 2 weeks ago

Another thought on topics: I think they should be "static", i.e., don't depend on the actual cluster. For example, we have https://github.com/dask/distributed/blob/9672121ce115df9268b12ad74e108d71ec8104c0/distributed/scheduler.py#L5916-L5924 which logs to a topic that depends on the actual address assigned to a worker at runtime. This makes writing consumer code for events and topics more difficult. Feel free to disagree.

fjetter commented 2 weeks ago

A couple of high level thoughts

Accepting all msgpack serializable structures is something I would like to keep (not a hill I'll die upon). This system can be used by endusers and I wouldn't want to force them to use a special container to emit a message
Many of your concerns are about how we use things internally and this can certainly change and be improved. We should be mindful about not breaking known consumers
I have mixed feelings about typing. The only way I could see this work out well is if we somehow maintained schema information about topics and offer APIs to manage this for custom events. This feels hardly worth the effort.
I wouldn't want this refactoring to cause significant implementation effort. We are just getting started with using this system a little more so now would be a good time for standardization I wouldn't want to invest a lot of time on this.

I feel most of your concerns would already be addressed by establishing a couple of sensible, internal best practices and moving our code towards this.

a couple of specifics

Having a catch-all "all" topic defeats the purpose of a topic-based

yes. If all was supported, this felt like a thing that should be implemented on the caller/subscriber but keeping the events twice is redundant.

Another thought on topics: I think they should be "static", i.e., don't depend on the actual cluster. For example, we have

I think this is a case where it would make sense to have two different event streams. I agree that every event should be piped to a static topic but I don't mind having a worker specific stream. For debugging this is useful and it doesn't cause any harm.

hendrikmakait commented 2 weeks ago

Another thought on topics: I think they should be "static", i.e., don't depend on the actual cluster. For example, we have

I think this is a case where it would make sense to have two different event streams. I agree that every event should be piped to a static topic but I don't mind having a worker specific stream. For debugging this is useful and it doesn't cause any harm.

Agreed. I should rephrase this to: A message should be available in a static topic, not only in a dynamic one.

hendrikmakait commented 2 weeks ago

I feel most of your concerns would already be addressed by establishing a couple of sensible, internal best practices and moving our code towards this.

Yup, that's largely what this issue is for :)

hendrikmakait commented 2 weeks ago

Accepting all msgpack serializable structures is something I would like to keep (not a hill I'll die upon). This system can be used by endusers and I wouldn't want to force them to use a special container to emit a message

All I'd be asking you to emit is something like {"action": "some-identifer", "<some key>": <WHATEVER>} if you didn't feel like emitting a structured message. Combined with a deprecation cycle, that feels like a low burden compared to forcing any downstream consumer to perform checks like isinstance(msg, dict) and msg.get("action", None) == "my-key" to access our average structured event just because you felt like dumping an int into an existing topic.

fjetter commented 2 weeks ago

I don't think it makes sense for users to be force to use an action keyword. That feels kind of arbitrary. I'm also not sure if this deprecation cycle would be worth the effort just to safe a consumer from an isinstance check

hendrikmakait commented 2 weeks ago

I don't think it makes sense for users to be force to use an action keyword.

FWIW, I'm also happy to require something like an action, message keyword in the log_event signature that we expose similarly in consuming functions on plugins. The way I see it, there should be some identifier telling me what type of message I'm dealing with and then some payload belonging to said message.

dask / distributed

Improve structure of event logging #8688