Open hendrikmakait opened 2 weeks ago
Another thought on topics: I think they should be "static", i.e., don't depend on the actual cluster. For example, we have https://github.com/dask/distributed/blob/9672121ce115df9268b12ad74e108d71ec8104c0/distributed/scheduler.py#L5916-L5924 which logs to a topic that depends on the actual address assigned to a worker at runtime. This makes writing consumer code for events and topics more difficult. Feel free to disagree.
A couple of high level thoughts
I feel most of your concerns would already be addressed by establishing a couple of sensible, internal best practices and moving our code towards this.
a couple of specifics
Having a catch-all "all" topic defeats the purpose of a topic-based
yes. If all was supported, this felt like a thing that should be implemented on the caller/subscriber but keeping the events twice is redundant.
Another thought on topics: I think they should be "static", i.e., don't depend on the actual cluster. For example, we have
I think this is a case where it would make sense to have two different event streams. I agree that every event should be piped to a static topic but I don't mind having a worker specific stream. For debugging this is useful and it doesn't cause any harm.
Another thought on topics: I think they should be "static", i.e., don't depend on the actual cluster. For example, we have
I think this is a case where it would make sense to have two different event streams. I agree that every event should be piped to a static topic but I don't mind having a worker specific stream. For debugging this is useful and it doesn't cause any harm.
Agreed. I should rephrase this to: A message should be available in a static topic, not only in a dynamic one.
I feel most of your concerns would already be addressed by establishing a couple of sensible, internal best practices and moving our code towards this.
Yup, that's largely what this issue is for :)
Accepting all msgpack serializable structures is something I would like to keep (not a hill I'll die upon). This system can be used by endusers and I wouldn't want to force them to use a special container to emit a message
All I'd be asking you to emit is something like {"action": "some-identifer", "<some key>": <WHATEVER>}
if you didn't feel like emitting a structured message. Combined with a deprecation cycle, that feels like a low burden compared to forcing any downstream consumer to perform checks like isinstance(msg, dict) and msg.get("action", None) == "my-key"
to access our average structured event just because you felt like dumping an int
into an existing topic.
I don't think it makes sense for users to be force to use an action
keyword. That feels kind of arbitrary. I'm also not sure if this deprecation cycle would be worth the effort just to safe a consumer from an isinstance check
I don't think it makes sense for users to be force to use an action keyword.
FWIW, I'm also happy to require something like an action
, message
keyword in the log_event
signature that we expose similarly in consuming functions on plugins. The way I see it, there should be some identifier telling me what type of message I'm dealing with and then some payload belonging to said message.
There are two main things that I dislike about the current way we handle event logging:
scheduler
or P2P events top2p
.action
key that acts as an identifier of the specific message type. We should make this explicit. Right now, this free-form format just allows producers to dump bad/unstructured messages and force consumers to deal with the mess. Personally, I'd also be in favor of adding better type-hinting either via typed dicts, dataclasses, or similar.