fd4s / fs2-kafka

Functional Kafka Streams for Scala
https://fd4s.github.io/fs2-kafka
Apache License 2.0
293 stars 101 forks source link

Rethink `KafkaProducer`/`KafkaProducerConnection` #572

Open bplommer opened 3 years ago

bplommer commented 3 years ago

I’ve been thinking about how we can simplify the most common use cases.

KafkaProducerConnection really represents an untyped Kafka producer - it easily could, and possibly should, have a method produce[P, K: Deserializer, V: Deserializer](producerRecords: ProducerRecords[P, K, V]). KafkaProducer is really nothing more than a partial application of that general produce method.

I suggest that in 3.0 we rename KafkaProducerConnection to KafkaProducer and KafkaProducer to SerializingKafkaProducer.

I also suggest we add KafkaTopicProducer, which has both serializers and a topic name provided on instantiation and takes just key-value pairs rather than ProducerRecords as arguments for its produce method. This would simplify calling code in, I think, the great majority of use cases.

Further, there could be a variant that automatically derives keys from values (typically the key would be the same as some kind of id field from the value), so that calling code only needs to provide the value and not the key.

The main drawback I see is that the resulting number of variants could cause confusion, but I think careful API design would mitigate that.

Any thoughts? @vlovgr @LMnet

LMnet commented 3 years ago

KafkaProducerConnection changes sound reasonable for me. In my practice in most cases I ended up with the KafkaProducer[F, Option[String], Option[String]], which is in the essence just an untyped producer.

Not sure about naming. SerializingKafkaProducer is a bit long. But I don't have a better alternative atm.

Further, there could be a variant that automatically derives keys from values (typically the key would be the same as some kind of id field from the value), so that calling code only needs to provide the value and not the key.

I'm not convinced that this would give an advantage to the users. Automatic derivation could silently derive something wrong for the key, and the compiler will not throw an error in case of an untyped producer. I think in that particular case we should prefer explicit passing of key and value.

I also suggest we add KafkaTopicProducer, which has both serializers and a topic name provided on instantiation and takes just key-value pairs rather than ProducerRecords as arguments for its produce method. This would simplify calling code in, I think, the great majority of use cases.

I think it could be useful for simple use cases. KafkaTopicProducer could be a simple wrapper over KafkaProducer. But don't forget that ProducerRecord is not just a key-value pair and topic. It could have partition, headers, timestamps. It should be a part of the API.

bplommer commented 3 years ago

Not sure about naming. SerializingKafkaProducer is a bit long. But I don't have a better alternative atm.

Agreed. What about leaving KafkaProducer as it is and renaming KafkaProducerConnection to GenericKafkaProducer?

I'm not convinced that this would give an advantage to the users. Automatic derivation could silently derive something wrong for the key, and the compiler will not throw an error in case of an untyped producer. I think in that particular case we should prefer explicit passing of key and value.

I think I didn't explain my meaning properly. I don't mean the library should guess what the key should be, but rather a function mkKey: V => K for deriving it would be provided when the consumer is instantiated.

I think it could be useful for simple use cases. KafkaTopicProducer could be a simple wrapper over KafkaProducer. But don't forget that ProducerRecord is not just a key-value pair and topic. It could have partition, headers, timestamps. It should be a part of the API.

Yes, good point.

LMnet commented 3 years ago

What about leaving KafkaProducer as it is and renaming KafkaProducerConnection to GenericKafkaProducer?

Sounds good for me.

a function mkKey: V => K for deriving it would be provided when the consumer is instantiated.

After the first reading, I thought you are talking about typeclass to derive a key from the value. Simple function sounds better in this context because it's explicit and simple.

bplommer commented 3 years ago

After the first reading, I thought you are talking about typeclass to derive a key from the value. Simple function sounds better in this context because it's explicit and simple.

It could also allow a function K => ProducerRecord[F, K, V] to allow setting headers too. What do you think about SimpleKafkaProducer as a name?

LMnet commented 3 years ago

I would avoid Simple prefix because it's too generic. Maybe ValueKafkaProducer?

bplommer commented 3 years ago

How about we rename KafkaProducerConnection to GenericKafkaProducer for 2.0, with a deprecated type alias as KafkaProducerConnection? Then the change should be mostly source-compatible, and it doesn't matter that it's binary breaking.

LMnet commented 3 years ago

How about we rename KafkaProducerConnection to GenericKafkaProducer for 2.0, with a deprecated type alias as KafkaProducerConnection? Then the change should be mostly source-compatible, and it doesn't matter that it's binary breaking.

Ok