Closed spenczar closed 4 years ago
I agree, it would make sense to have separate classes for reading and writing. I would also argue that we should separate it further into Kafka-based Producer
and Consumer
classes which give us the added functionality, and wrap those in the Publisher
and Subscriber
classes which are more general.
My reasoning for wanting the Producer
and Consumer
classes as well is that we're extending the current functionality of the stock confluent-kafka classes with a few things:
Consumer:
Producer:
The serializer/deserializer functionality is more debatable whether it's value added at this stage, but the rest could be folded in extending the stock client.
I kind of like the open()
functionality though, and like the parallels to the built-in open()
, even if in this case the particular mode just routes to the Producer
and Subscriber
classes. I don't know if it's an appropriate abstraction though.
+1 on the proposed split, but I'd advocate for keeping the file-like object veneer on top and as the external interface.
My thinking is that the "stream is an infinite file of records" abstraction is easy to get from the library users' PoV, and makes it easy for newcomers to jump into development. I.e., something like:
with open("kafka://ztf.uw.edu", "r") as stream:
for msg from stream:
...do something...
comes naturally to one.
I'm happy with not supporting open(..., mode="rw")
-- that was me being overly zealous when I hacked the write portion together and noticed it could be done :).
@myNameIsPatrick I really like that list of features, thank you! I think that wrapping into Kafka-specific vs general ones is probably more than we need to do right now. We won't know what's general until we have another backend to support. Once we do, then we can provide the generalizing superclass, but if we try to abstract prematurely, we'll probably just get it wrong.
@mjuric I do think that the file-like open
is really nice as a convenient constructor, yes. I like that it uses the protocol field of the URL, too, so if we do have different backends, it can just provide the right sort of stream - a amqp://
one, or a vo://
one, or whatever.
Today, genesis.streaming pretty much provides one class:
AlertBroker
. To use one for writing, you pass a string which contains a"w"
character in to themode
argument when constructing it; an"r"
sets it up for reading.Using one class for both reading and writing makes the internal code more complex and confusing, inviting bugs. It doesn't provide value for users, who will virtually always want to either read or write to a stream. And it makes the API more confusing for users.
Examples of the complexity and confusion:
AlertBroker
has 14 methods, but they have very little overlap in the two cases:_validate_and_process
_filtered_stream
_raw_stream
commit
_commit_to_kafka
_stream
__call__
__iter__
write
flush
__init__
(but in truth the only shared logic is parsing the Kafka URL and loading a config file)__enter__
(but this is justreturn self
)__exit__
(but this is just an alias forclose
)close
(but this is essentially justif consuming: self.commit(); else: self.flush()
)AlertBroker.commit
vsAlertBroker.flush
- committing is for read state, while flushing is for write state. This may make sense if you know a lot about Kafka, but it doesn't generalize well to other pubsub systems.I think that separate
Publisher
andSubscriber
classes would make a lot more sense, here.