confluentinc / cp-docker-images

[DEPRECATED] Docker images for Confluent Platform.
Apache License 2.0
1.14k stars 703 forks source link

Adding Plugin Gives HTTP 500 Error #303

Open chenchik opened 7 years ago

chenchik commented 7 years ago

I am trying to add a plugin connector from github:

https://github.com/jcustenborder/kafka-connect-spooldir

I'm using Openshift and am running Kafka Connect using the docker image below (it's basically confluentinc/cp-docker-images except it has the jar files from the previously mentioned GitHub and the hdfs connector jar is upgraded) :

https://hub.docker.com/r/chenchik/custom-connect-hdfs/

I run kafka connect with a huge command that sets a ton of environment variables. From what I understand the plugins should be in /etc/kafka-connect/jars. So the most important part of this command is probably: -e CONNECT_PLUGIN_PATH="/etc/kafka-connect/jars"

which is where I put my jars from the github after running mvn clean package

huge command:

oc new-app chenchik/custom-connect-hdfs:latest -e CONNECT_BOOTSTRAP_SERVERS=--------------:9092 -e CONNECT_GROUP_ID="connect_---" -e CONNECT_CONFIG_STORAGE_TOPIC="----" -e CONNECT_OFFSET_STORAGE_TOPIC="----" -e CONNECT_STATUS_STORAGE_TOPIC="----" -e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.storage.StringConverter" -e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.storage.StringConverter" -e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" -e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" -e CONNECT_REST_ADVERTISED_HOST_NAME="connect" -e HADOOP_USER_NAME="hdfs" -e CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE="false" -e CONNECT_VALUES_CONVERTER_SCHEMAS_ENABLE="false" -e CONNECT_INTERNAL_KEY_CONVERTER_SCHEMAS_ENABLE="false" -e CONNECT_INTERNAL_VALUE_CONVERTER_SCHEMAS_ENABLE="false" -e CONNECT_AUTO_OFFSET_RESET="latest" -e AUTO_OFFSET_RESET="latest" -e TERM=xterm -e KAFKA_HEAP_OPTS="-Xms512m -Xmx1g" -e CONNECT_KAFKA_HEAP_OPTS="-Xms512m -Xmx1g" -e CONNECT_PLUGIN_PATH="/etc/kafka-connect/jars" -e CLASSPATH="/etc/kafka-connect/jars" --name=c

I've tried making the CONNECT_PLUGIN_PATH simply:

/plugins and /usr/local/share/kafka/plugins/

as well.

In order to create the connector, I use the REST API and issue a POST request with this payload:

{
    "name": "csv-json-1",
    "config": {
        "connector.class": "com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector",
        "tasks.max": "1",
        "finished.path":"/csv-json/results/",
        "input.file.pattern":".*csv",
        "error.path":"/csv-json/errors/",
        "topic":"csv-json",
        "input.path":"/csv-json/input/",
        "key.schema":"com.github.jcustenborder.kafka.connect.spooldir.CsvSchemaGenerator",
        "value.schema":"com.github.jcustenborder.kafka.connect.spooldir.CsvSchemaGenerator"
    }
}

The response I get every time is:

{
    "error_code": 500,
    "message": "Failed to find any class that implements Connector and which name matches com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector, available connectors are: org.apache.kafka.connect.source.SourceConnector, org.apache.kafka.connect.tools.MockSourceConnector, org.apache.kafka.connect.file.FileStreamSinkConnector, io.confluent.connect.hdfs.tools.SchemaSourceConnector, org.apache.kafka.connect.tools.VerifiableSourceConnector, io.confluent.connect.s3.S3SinkConnector, org.apache.kafka.connect.file.FileStreamSourceConnector, org.apache.kafka.connect.tools.VerifiableSinkConnector, io.confluent.connect.jdbc.JdbcSinkConnector, io.confluent.connect.jdbc.JdbcSourceConnector, io.confluent.connect.elasticsearch.ElasticsearchSinkConnector, org.apache.kafka.connect.sink.SinkConnector, io.confluent.connect.storage.tools.SchemaSourceConnector, org.apache.kafka.connect.tools.MockConnector, org.apache.kafka.connect.tools.MockSinkConnector, org.apache.kafka.connect.tools.SchemaSourceConnector, io.confluent.connect.hdfs.HdfsSinkConnector"
}

If I issue a GET request to /connector-plugins, it is not listed.

I also cannot seem to find any logs inside of the container that can explain what's going on. The only kind log message I get is from the log the container is providing openshift. This is the entry that pops up:

[2017-08-01 22:22:35,635] INFO 172.17.0.1 - - [01/Aug/2017:22:22:15 +0000] "POST /connectors HTTP/1.1" 500 1081 20544 (org.apache.kafka.connect.runtime.rest.RestServer)

What can I do to resolve this issue?

ewencp commented 7 years ago

First, there are 2 ways to load plugins as of the newest release. Previously the only way was to put the jars on the classpath. That's what putting them under /etc/kafka-connect/jars does. You should not need to set any additional environment variables to make that work.

As of Kafka 0.11 and CP 3.3, you can also load via the new plugin.path setting, which is what the environment variable CONNECT_PLUGIN_PATH would be doing. In that case, the structure is a bit different -- inside that directory, we'd expect to find a directory for each plugin, and within that directory we'd expect to find the jars.

Is there anything else in that log before that? The relevant parts will probably be earlier during the time when plugins are being loaded/the classpath is being scanned.

Also, you might want to increase the log4j log level to get more debug output.

chenchik commented 7 years ago

I would like to load the plugins using the /etc/kafka-connect/jars approach as I cannot upgrade to the most recent docker image. My current brokers are not running the most up to date version of Kafka and do not allow Auto Topic Creation.

If I try to use the CP 3.3 CONNECT_PLUGIN_path approach. I can't even boot up Kafka Connect as it cannot find my brokers. The error message below keeps on repeating as the very first line after the log states "Kafka Connect Starting":

Request (type=MetadataRequest, topics=) failed on all bootstrap brokers [*******:9092 (id: -1 rack: null)].
Expected 1 brokers but found only 0. Trying to query Kafka for metadata again ...
Request (type=MetadataRequest, topics=) failed against node *********:9092 (id: -1 rack: null).
org.apache.kafka.common.errors.UnsupportedVersionException: MetadataRequest versions older than 4 don't support the allowAutoTopicCreation field

I can't seem to find anything significant in the logs, also for some reason, when I turn on the log4j level for connect. I get tons of errors that prevent my REST requests from coming through.

javax.servlet.ServletException: org.glassfish.jersey.server.ContainerException: java.lang.NoClassDefFoundError: com/opencsv/ICSVParser
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:159)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:499)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.glassfish.jersey.server.ContainerException: java.lang.NoClassDefFoundError: com/opencsv/ICSVParser
    at org.glassfish.jersey.servlet.internal.ResponseWriter.rethrow(ResponseWriter.java:278)
    at org.glassfish.jersey.servlet.internal.ResponseWriter.failure(ResponseWriter.java:260)
    at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:509)
    at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:334)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
    ... 23 more
Caused by: java.lang.NoClassDefFoundError: com/opencsv/ICSVParser
    at com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector.config(SpoolDirCsvSourceConnector.java:44)
    at org.apache.kafka.connect.connector.Connector.validate(Connector.java:133)
    at org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:254)
    at org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:508)
    at org.apache.kafka.connect.runtime.distributed.DistributedHerder$6.call(DistributedHerder.java:505)
    at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:248)
    at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:198)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    ... 1 more
Caused by: java.lang.ClassNotFoundException: com.opencsv.ICSVParser
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 12 more
[2017-08-02 19:38:35,902] WARN /connectors/csv-json-test/config (org.eclipse.jetty.server.HttpChannel)
java.lang.NoSuchMethodError: javax.servlet.http.HttpServletRequest.isAsyncStarted()Z
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:684)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:159)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:499)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:745)
[2017-08-02 19:38:35,903] DEBUG onClose SelectChannelEndPoint@68a3ceee{/172.17.0.1:49386<->8083,CLOSED,in,out,-,-,6/30000,HttpConnection}{io=0,kio=0,kro=1} (org.eclipse.jetty.io.AbstractEndPoint)
[2017-08-02 19:38:35,904] DEBUG close SelectChannelEndPoint@68a3ceee{/172.17.0.1:49386<->8083,CLOSED,in,out,-,-,6/30000,HttpConnection}{io=0,kio=0,kro=1} (org.eclipse.jetty.io.ChannelEndPoint)
[2017-08-02 19:38:35,904] DEBUG Destroyed 

I also have received these errors earlier today using the log4j debugging:

[2017-08-01 23:01:54,197] WARN  (org.eclipse.jetty.servlet.ServletHandler)
javax.servlet.ServletException: org.glassfish.jersey.server.ContainerException: java.lang.OutOfMemoryError: Java heap space

[2017-08-01 23:01:54,201] WARN /connector-plugins (org.eclipse.jetty.server.HttpChannel)
java.lang.NoSuchMethodError: javax.servlet.http.HttpServletRequest.isAsyncStarted()Z

I also tried removing all environment variables relating to plugins and putting my jars in the /etc/kafka-connect/jars directory, I was greeted with a "failure to find connector" error.

Do you know what else I can try?

ewencp commented 7 years ago

Errors like java.lang.NoSuchMethodError usually mean there is a classpath conflict. You'd probably need to make sure you're building the version of the connector that corresponds to the same version of Kafka because that's probably the version that will not have any conflicting jars.

FYI, you actually could use the plugin.path approach, you'd just need to pre-create the topics as described here so that the Connect worker won't try to create them itself.

chenchik commented 7 years ago

With the plugin.path approach the topics I enter when creating the container/worker are already created. To be exact, the 3 necessary topics:

connect-configs
connect-offsets
connect-status

Are already made before I boot up the worker, but I still receive the MetadataRequest versions older than 4 don't support the allowAutoTopicCreation field error.

ewencp commented 7 years ago

Yeah, this looks like a compatibility bug in the framework itself with the new automatic internal topic creation. I've filed a JIRA to track this here. It looks like the fix for it should be pretty small so hopefully it'll be easy to get into the next Kafka release.

chenchik commented 7 years ago

Is there a way to make this change in Kafka Connect directly? I won't be able to update my Kafka Cluster.

ewencp commented 7 years ago

The fix has landed in Kafka https://issues.apache.org/jira/browse/KAFKA-5704 The fix would be included in 0.11.0.1.

hieuhc commented 7 years ago

Same happens when I deploy Schema Registry on a Kubernetes cluster. Use a Kafka image version 0.11.0.1 fixes the problem. Thank you.

anastath commented 5 years ago

Hi All, I have a similar problem to create a new connector, and would need some help... The steps I followed were:

  1. I cloned kafka locally from this repository (https://github.com/wurstmeister/kafka-docker)
    1. I downloaded from docker hub the docker images for zookeeper (https://hub.docker.com/_/zookeeper) and kafka-connect (https://hub.docker.com/r/confluentinc/cp-kafka-connect).
    2. I cloned this repo locally (https://github.com/jcustenborder/kafka-connect-spooldir) and packaged it a jar so as to use this as a kafka plugin
  2. I set up docker-compose file as per the attached docker-compose-1-broker-Connect-yml.txt

    where I added to the kafka-connect image, (i) a CONNECT_PLUGIN_PATH environment variable pointing to "/etc/kakfa-connect/jars", and (ii) a volume from my host directory holding the jar file to the "/etc/kakfa-connect/jars"

  3. I then tried to create a new connector of this class using the kafka connect rest api as follows

curl -X POST -H "Content-Type: application/json" --data '{"name": "quickstart-json-dir", "config": {"connector.class":"com.github.jcustenborder.kafka.connect.spooldir.SpoolDirJsonSourceConnector", "tasks.max":"1", "topic":"quickstart-data-6", "finished.path": "/data/results", "input.file.pattern": ".*json", "input.path": "/data/input", "error.path": "/data/error"}}' http://localhost:8083/connectors

The response is Error 500 Request Failed, see attached png for more details image

I'm pretty sure there is neither a connection issue, nor a server problem since I can create a connector of another class without any problem, e.g. the following works like a charm (connector created, can check its status is running)

curl -X POST -H "Content-Type: application/json" --data '{"name": "quickstart-file-source-json-5", "config": {"connector.class":"org.apache.kafka.connect.file.FileStreamSourceConnector", "tasks.max":"1", "topic":"quickstart-data-5", "file": "/data/test-4.json"}}' http://localhost:8083/connectors

I tend to believe that although I set up the CONNECT_PLUGIN_PATH in docker compose file, the connector class cannot be loaded and found.

Should I set up plugin.path another way? There is a properties file in kafka-connect container called /etc/kafka/connect-standalone.properties, but how can I edit it before starting the container?

Could be another issue? Thanks in advance!