Split reading and writing into two different classes

spenczar commented 4 years ago

Today, genesis.streaming pretty much provides one class: AlertBroker. To use one for writing, you pass a string which contains a"w" character in to the mode argument when constructing it; an "r" sets it up for reading.

Using one class for both reading and writing makes the internal code more complex and confusing, inviting bugs. It doesn't provide value for users, who will virtually always want to either read or write to a stream. And it makes the API more confusing for users.

Examples of the complexity and confusion:

AlertBroker has 14 methods, but they have very little overlap in the two cases:
- 8 are used only when consuming:
  - _validate_and_process
  - _filtered_stream
  - _raw_stream
  - commit
  - _commit_to_kafka
  - _stream
  - __call__
  - __iter__
- 2 are used only when producing:
  - write
  - flush
- 4 are shared:
  - __init__ (but in truth the only shared logic is parsing the Kafka URL and loading a config file)
  - __enter__ (but this is just return self)
  - __exit__ (but this is just an alias for close)
  - close (but this is essentially just if consuming: self.commit(); else: self.flush())
AlertBroker.commit vs AlertBroker.flush - committing is for read state, while flushing is for write state. This may make sense if you know a lot about Kafka, but it doesn't generalize well to other pubsub systems.
Group IDs are parsed out of the Kafka URL, even when publishing - which is an invitation for errors. Publisher URLs shouldn't include Group IDs.

I think that separate Publisher and Subscriber classes would make a lot more sense, here.

myNameIsPatrick commented 4 years ago

I agree, it would make sense to have separate classes for reading and writing. I would also argue that we should separate it further into Kafka-based Producer and Consumer classes which give us the added functionality, and wrap those in the Publisher and Subscriber classes which are more general.

My reasoning for wanting the Producer and Consumer classes as well is that we're extending the current functionality of the stock confluent-kafka classes with a few things:

Consumer:

An iterator interface
Custom offset handling to mitigate missed messages
Context manager
A serializer

Producer:

Context manager
A deserializer

The serializer/deserializer functionality is more debatable whether it's value added at this stage, but the rest could be folded in extending the stock client.

I kind of like the open() functionality though, and like the parallels to the built-in open(), even if in this case the particular mode just routes to the Producer and Subscriber classes. I don't know if it's an appropriate abstraction though.

mjuric commented 4 years ago

+1 on the proposed split, but I'd advocate for keeping the file-like object veneer on top and as the external interface.

My thinking is that the "stream is an infinite file of records" abstraction is easy to get from the library users' PoV, and makes it easy for newcomers to jump into development. I.e., something like:

with open("kafka://ztf.uw.edu", "r") as stream:
    for msg from stream:
        ...do something...

comes naturally to one.

I'm happy with not supporting open(..., mode="rw") -- that was me being overly zealous when I hacked the write portion together and noticed it could be done :).

spenczar commented 4 years ago

@myNameIsPatrick I really like that list of features, thank you! I think that wrapping into Kafka-specific vs general ones is probably more than we need to do right now. We won't know what's general until we have another backend to support. Once we do, then we can provide the generalizing superclass, but if we try to abstract prematurely, we'll probably just get it wrong.

@mjuric I do think that the file-like open is really nice as a convenient constructor, yes. I like that it uses the protocol field of the URL, too, so if we do have different backends, it can just provide the right sort of stream - a amqp:// one, or a vo:// one, or whatever.

astronomy-commons / adc-streaming

Split reading and writing into two different classes #21