confluentinc / cp-docker-images

[DEPRECATED] Docker images for Confluent Platform.
Apache License 2.0
1.14k stars 704 forks source link

We need a standalone kafka-connect image #790

Closed ferozed closed 4 years ago

ferozed commented 5 years ago

The cp-kafka-connect image ( https://hub.docker.com/r/confluentinc/cp-kafka-connect ) comes with schema registry and control center, as per the documentation on ( https://docs.confluent.io/current/installation/docker/image-reference.html ).

However, we already have schema registry and control center running on a different machine. So we dont need the cp-kafka-connect image to start these up again.

Please tell me how to run standalone kafka-connect service. Or... please publish a standalone kafka-connect docker image.

OneCricketeer commented 5 years ago

The connect image just runs connect-distributed, not Schema Registry and/or Control Center.

That column you're reading is "Packages Included", meaning that the Control Center interceptors are installed for monitoring and schema-registry is included for the Avro converters.

nickvgils commented 5 years ago

Need an image that runs connect-standalone to store offset data locally for a JDBC sink connector. Seeing some unofficial packages which doesn't have much documentation on them. Can there be an official one included in confluentinc hub?

OneCricketeer commented 5 years ago

@nickvgils

Data isn't stored locally anyway,though.

The JDBC sink connector always reads from a remote Kafka topic and sends the data to a remote database. If by "locally", you mean it's running on the host machine, then you need to make the adjustments to the configurations to use the host address rather than localhost. And just because it's "distributed mode" doesn't mean that it needs to be distributed over multiple instances - even standalone mode could share the same consumer group.

The Confluent Hub isn't for Docker images, just connector plugins

nickvgils commented 5 years ago

@cricket007

Hmm, thanks for your response. Let me tell you my use case:

Problem: storing from Kafka topic to a database (sqlite) on windows requires confluent connect, which is Unix only. Solution is probably using Docker. If i run the cp-kafka-connect image, it is always starting in distributed mode. This mode demands to have storage like offset, config and status to be stored inside a topic on the remote Ubuntu server. I want this data to be locally to limit the number of topics created (which is possible with running Kafka connect standalone).

The Confluent Hub isn't for Docker images, just connector plugins

You're right, misstated this one. What i meant was that confluent should add a confluent connect standalone image to their docker images, to store storage locally.

OneCricketeer commented 5 years ago

@nickvgils

For clarification, it's "Kafka Connect", and not proprietary to Confluent.

For sink connectors, offsets are stored back in the broker, even when using standalone mode. This is because it's a regular consumer group under the hood.

I'm not sure I see a need to store config or status locally to a container, so that would just leave the sqlite database, which I hope you're volume mapping out of the container so that you can access it otherwise.

That all being said, no, Kafka Connect is not unix specific - both standalone and distributed have windows scripts : https://github.com/apache/kafka/blob/trunk/bin/windows/connect-standalone.bat

nickvgils commented 5 years ago

@cricket007

Thank you for this comment, totally cleared my head. Misunderstood the definition of the "locally storing". After doing some more reading i finally got it. You're right about storing the offsets. Was thinking about running in docker but with the Kafka Connect bat file for Windows i don't need Docker at all, just as you said. Just ran Kafka connect on Windows with an additional jdbc plugin, sqlite-jdbc and avro converter jar files and it works like a charm. All data from a topic gets stored inside sqlite.

HunderlineK commented 4 years ago

Does the connector image deploy a JVM for Kafka itself as well? When I check the running processes there is one named Kafka.

OneCricketeer commented 4 years ago

cp-kafka-connect only runs ConnectDistribued JVM process

OneCricketeer commented 4 years ago

👋 @ferozed

Have your concerns been addressed here?

MetaBarj0 commented 4 years ago

I had the same question as @ferozed. Thank you @OneCricketeer that's crysta clear. 👏

OneCricketeer commented 4 years ago

You can find an image on my profile, btw

ferozed commented 4 years ago

@OneCricketeer Yeah my concerns are addressed.

This is indeed a standalone image. Whether it runs in standalone mode or distributed mode depends on how you configure it.

gayakwad commented 4 years ago

@OneCricketeer

Please let me know where can I ask questions related to this thread, in case this is not the appropriate place.

I using kakfa where I can not create any additional topic

OneCricketeer commented 4 years ago

3) This is open source Kafka Connect. As answered, it runs distributed mode, which 1-2) requires a broker to interact with, and requires 3 internal topics as well as extra topics to sink/source