confluentinc / kafka-connect-hdfs

Kafka Connect HDFS connector
Other
475 stars 397 forks source link

CVE fix #668

Closed vbalani002 closed 9 months ago

vbalani002 commented 9 months ago

Problem

CVE fix for https://confluentinc.atlassian.net/browse/CC-20328

Includes code changes to accommodate the transition in the return type of shouldSchemaChange method from boolean in v11.1.4 to SchemaCompatibilityResult in v11.2.3 of kafka-connect-storage-common

Solution

Does this solution apply anywhere else?
If yes, where?

Test Strategy

Testing done:

Units Tests

[INFO] Results:
[INFO]
[INFO] Tests run: 151, Failures: 0, Errors: 0, Skipped: 0

Manual Tests

18:56:46 ℹ️ 🚀 Running example with flags
18:56:46 ℹ️ ⛳ Flags used are  --connector-zip=/Users/vbalani/kafka-connect-hdfs/target/components/packages/confluentinc-kafka-connect-hdfs-10.2.4-SNAPSHOT.zip
18:56:46 ℹ️ 💀 Kill all docker containers
18:56:57 ℹ️ ####################################################
18:56:57 ℹ️ 🚀 Executing hdfs2-sink.sh in dir .
18:56:57 ℹ️ ####################################################
18:56:58 ℹ️ 💫 Using default CP version 7.5.0
18:56:58 ℹ️ 🎓 Use --tag option to specify different version, see https://kafka-docker-playground.io/#/how-to-use?id=🎯-for-confluent-platform-cp
18:56:58 ℹ️ 🎯🤐 CONNECTOR_ZIP (--connector-zip option) is set with /Users/vbalani/kafka-connect-hdfs/target/components/packages/confluentinc-kafka-connect-hdfs-10.2.4-SNAPSHOT.zip
18:56:58 ℹ️ 🧰 Checking if Docker image confluentinc/cp-server-connect-base:7.5.0 contains additional tools
18:56:58 ℹ️ 🧰 it can take a while if image is downloaded for the first time
18:57:10 ℹ️ 🎱 Installing connector from zip confluentinc-kafka-connect-hdfs-10.2.4-SNAPSHOT.zip
Running in a "--no-prompt" mode
Implicit acceptance of the license below:
Confluent Community License
http://www.confluent.io/confluent-community-license
Installing a component Kafka Connect HDFS 10.2.4-SNAPSHOT, provided by Confluent, Inc. from the local file: /tmp/confluentinc-kafka-connect-hdfs-10.2.4-SNAPSHOT.zip into directory: /usr/share/confluent-hub-components
Adding installation directory to plugin path in the following files:
  /etc/kafka/connect-distributed.properties
  /etc/kafka/connect-standalone.properties
  /etc/schema-registry/connect-avro-distributed.properties
  /etc/schema-registry/connect-avro-standalone.properties

Completed
19:15:48 ℹ️ 🛑 control-center is disabled
19:15:48 ℹ️ 🛑 ksqldb is disabled
19:15:48 ℹ️ 🛑 Grafana is disabled
19:15:48 ℹ️ 🛑 kcat is disabled
19:15:48 ℹ️ 🛑 conduktor is disabled
[+] Running 4/4
 ⠿ Container broker           Removed                                                                                                                                                                                                   10.8s
 ⠿ Volume plaintext_datanode  Removed                                                                                                                                                                                                    0.0s
 ⠿ Volume plaintext_namenode  Removed                                                                                                                                                                                                    0.0s
 ⠿ Network plaintext_default  Removed                                                                                                                                                                                                    0.1s
[+] Running 13/13
 ⠿ Network plaintext_default            Created                                                                                                                                                                                          0.1s
 ⠿ Volume "plaintext_datanode"          Created                                                                                                                                                                                          0.0s
 ⠿ Volume "plaintext_namenode"          Created                                                                                                                                                                                          0.0s
 ⠿ Container broker                     Started                                                                                                                                                                                          3.5s
 ⠿ Container hive-metastore-postgresql  Started                                                                                                                                                                                          2.4s
 ⠿ Container zookeeper                  Started                                                                                                                                                                                          3.2s
 ⠿ Container hive-metastore             Started                                                                                                                                                                                          2.8s
 ⠿ Container presto-coordinator         Started                                                                                                                                                                                          3.5s
 ⠿ Container hive-server                Started                                                                                                                                                                                          2.8s
 ⠿ Container namenode                   Started                                                                                                                                                                                          2.0s
 ⠿ Container datanode                   Started                                                                                                                                                                                          2.2s
 ⠿ Container schema-registry            Started                                                                                                                                                                                          4.6s
 ⠿ Container connect                    Started                                                                                                                                                                                          5.5s
19:16:07 ℹ️ 📝 To see the actual properties file, use cli command playground get-properties -c <container>
19:16:07 ℹ️ ✨ If you modify a docker-compose file and want to re-create the container(s), run cli command playground container recreate
19:16:07 ℹ️ ⌛ Waiting up to 480 seconds for connect to start
19:19:08 ℹ️ 🚦 connect is started!
19:19:09 ℹ️ 📊 JMX metrics are available locally on those ports:
19:19:09 ℹ️     - zookeeper       : 9999
19:19:09 ℹ️     - broker          : 10000
19:19:09 ℹ️     - schema-registry : 10001
19:19:09 ℹ️     - connect         : 10002
19:19:21 ℹ️ Creating HDFS Sink connector
19:19:24 ℹ️ 🛠️ Creating connector hdfs-sink
19:19:24 ℹ️ ✅ Connector hdfs-sink was successfully created
19:19:24 ℹ️ 💈 Configuration is
{
  "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
  "flush.size": "3",
  "hadoop.conf.dir": "/etc/hadoop/",
  "hive.database": "testhive",
  "hive.integration": "true",
  "hive.metastore.uris": "thrift://hive-metastore:9083",
  "key.converter": "org.apache.kafka.connect.storage.StringConverter",
  "logs.dir": "/tmp",
  "partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner",
  "rotate.interval.ms": "120000",
  "schema.compatibility": "BACKWARD",
  "store.url": "hdfs://namenode:8020",
  "tasks.max": "1",
  "topics": "test_hdfs",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "value.converter.schema.registry.url": "http://schema-registry:8081"
}
19:19:25 ℹ️ 🥁 Waiting a few seconds to get new status
19:19:33 ℹ️ 🧩 Displaying connector status for hdfs-sink
Name                           Status       Tasks                                                        Stack Trace
-----------------------------------------------------------------------------------------------------------------------------
hdfs-sink                      ✅ RUNNING  0:🟢 RUNNING[connect]        -
-------------------------------------------------------------------------------------------------------------
19:19:34 ℹ️ Sending messages to topic test_hdfs
19:19:35 ℹ️ 🔮 schema was identified as avro
19:19:35 ℹ️ ✨ generating data...
19:19:35 ℹ️ ☢️ --forced-value is set
19:19:35 ℹ️ ✨ 10 records were generated based on --forced-value  (only showing first 10), took: 0min 1sec
{"f1":"value1"}
{"f1":"value2"}
{"f1":"value3"}
{"f1":"value4"}
{"f1":"value5"}
{"f1":"value6"}
{"f1":"value7"}
{"f1":"value8"}
{"f1":"value9"}
{"f1":"value10"}
19:19:44 ℹ️ 💯 Get number of records in topic test_hdfs
0
19:19:44 ℹ️ 📤 producing 10 records to topic test_hdfs
[2023-09-27 13:49:47,753] WARN MessageReader is deprecated. Please use org.apache.kafka.tools.api.RecordReader instead (kafka.tools.ConsoleProducer$)
19:19:51 ℹ️ 📤 produced 10 records to topic test_hdfs, took: 0min 7sec
19:19:54 ℹ️ 💯 Get number of records in topic test_hdfs
10
19:20:18 ℹ️ Listing content of /topics/test_hdfs/partition=0 in HDFS
Found 3 items
-rw-r--r--   3 appuser supergroup        213 2023-09-27 13:49 /topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro
-rw-r--r--   3 appuser supergroup        213 2023-09-27 13:49 /topics/test_hdfs/partition=0/test_hdfs+0+0000000003+0000000005.avro
-rw-r--r--   3 appuser supergroup        213 2023-09-27 13:49 /topics/test_hdfs/partition=0/test_hdfs+0+0000000006+0000000008.avro
19:20:20 ℹ️ Getting one of the avro files locally and displaying content with avro-tools
23/09/27 13:50:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
{"f1":"value1"}
{"f1":"value2"}
{"f1":"value3"}
19:20:26 ℹ️ Check data with beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.2 by Apache Hive
beeline> !connect jdbc:hive2://hive-server:10000/testhive
Connecting to jdbc:hive2://hive-server:10000/testhive
Enter username for jdbc:hive2://hive-server:10000/testhive: hive
Enter password for jdbc:hive2://hive-server:10000/testhive: ****
Connected to: Apache Hive (version 2.3.2)
Driver: Hive JDBC (version 2.3.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hive-server:10000/testhive> show create table test_hdfs;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `test_hdfs`(                 |
|   `f1` string COMMENT '')                          |
| PARTITIONED BY (                                   |
|   `partition` string COMMENT '')                   |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'   |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' |
| LOCATION                                           |
|   'hdfs://namenode:8020/topics/test_hdfs'          |
| TBLPROPERTIES (                                    |
|   'avro.schema.literal'='{"type":"record","name":"ConnectDefault","namespace":"io.confluent.connect.avro","fields":[{"name":"f1","type":"string"}]}',  |
|   'transient_lastDdlTime'='1695822593')            |
+----------------------------------------------------+
15 rows selected (1.492 seconds)
0: jdbc:hive2://hive-server:10000/testhive> select * from test_hdfs;
+---------------+----------------------+
| test_hdfs.f1  | test_hdfs.partition  |
+---------------+----------------------+
| value1        | 0                    |
| value2        | 0                    |
| value3        | 0                    |
| value4        | 0                    |
| value5        | 0                    |
| value6        | 0                    |
| value7        | 0                    |
| value8        | 0                    |
| value9        | 0                    |
+---------------+----------------------+
9 rows selected (1.95 seconds)
0: jdbc:hive2://hive-server:10000/testhive> Closing: 0: jdbc:hive2://hive-server:10000/testhive
| value1        | 0                    |
19:20:32 ℹ️ ####################################################
19:20:33 ℹ️ ✅ RESULT: SUCCESS for hdfs2-sink.sh (took: 23min 35sec - )
19:20:33 ℹ️ ####################################################

19:20:36 ℹ️ ✨ --connector flag was not provided, applying command to all connectors
19:20:37 ℹ️ 🧩 Displaying connector status for hdfs-sink
Name                           Status       Tasks                                                        Stack Trace
-----------------------------------------------------------------------------------------------------------------------------
hdfs-sink                      ✅ RUNNING  0:🟢 RUNNING[connect]        -
-------------------------------------------------------------------------------------------------------------
19:20:43 ℹ️ 🗯️ Version currently used for confluentinc-kafka-connect-hdfs is not latest
19:20:43 ℹ️ Current
"🔢 v10.2.4-SNAPSHOT - 📅 release date: 2023-09-27"
19:20:43 ℹ️ Latest on Hub
"🔢 v10.2.3 - 📅 release date: 2023-09-27"
19:20:44 ℹ️ 🌐 documentation is available at:

Release Plan