Closed vbalani002 closed 9 months ago
CVE fix for https://confluentinc.atlassian.net/browse/CC-20328
Includes code changes to accommodate the transition in the return type of shouldSchemaChange method from boolean in v11.1.4 to SchemaCompatibilityResult in v11.2.3 of kafka-connect-storage-common
shouldSchemaChange
Units Tests
[INFO] Results: [INFO] [INFO] Tests run: 151, Failures: 0, Errors: 0, Skipped: 0
Manual Tests
18:56:46 ℹ️ 🚀 Running example with flags 18:56:46 ℹ️ ⛳ Flags used are --connector-zip=/Users/vbalani/kafka-connect-hdfs/target/components/packages/confluentinc-kafka-connect-hdfs-10.2.4-SNAPSHOT.zip 18:56:46 ℹ️ 💀 Kill all docker containers 18:56:57 ℹ️ #################################################### 18:56:57 ℹ️ 🚀 Executing hdfs2-sink.sh in dir . 18:56:57 ℹ️ #################################################### 18:56:58 ℹ️ 💫 Using default CP version 7.5.0 18:56:58 ℹ️ 🎓 Use --tag option to specify different version, see https://kafka-docker-playground.io/#/how-to-use?id=🎯-for-confluent-platform-cp 18:56:58 ℹ️ 🎯🤐 CONNECTOR_ZIP (--connector-zip option) is set with /Users/vbalani/kafka-connect-hdfs/target/components/packages/confluentinc-kafka-connect-hdfs-10.2.4-SNAPSHOT.zip 18:56:58 ℹ️ 🧰 Checking if Docker image confluentinc/cp-server-connect-base:7.5.0 contains additional tools 18:56:58 ℹ️ 🧰 it can take a while if image is downloaded for the first time 18:57:10 ℹ️ 🎱 Installing connector from zip confluentinc-kafka-connect-hdfs-10.2.4-SNAPSHOT.zip Running in a "--no-prompt" mode Implicit acceptance of the license below: Confluent Community License http://www.confluent.io/confluent-community-license Installing a component Kafka Connect HDFS 10.2.4-SNAPSHOT, provided by Confluent, Inc. from the local file: /tmp/confluentinc-kafka-connect-hdfs-10.2.4-SNAPSHOT.zip into directory: /usr/share/confluent-hub-components Adding installation directory to plugin path in the following files: /etc/kafka/connect-distributed.properties /etc/kafka/connect-standalone.properties /etc/schema-registry/connect-avro-distributed.properties /etc/schema-registry/connect-avro-standalone.properties Completed 19:15:48 ℹ️ 🛑 control-center is disabled 19:15:48 ℹ️ 🛑 ksqldb is disabled 19:15:48 ℹ️ 🛑 Grafana is disabled 19:15:48 ℹ️ 🛑 kcat is disabled 19:15:48 ℹ️ 🛑 conduktor is disabled [+] Running 4/4 ⠿ Container broker Removed 10.8s ⠿ Volume plaintext_datanode Removed 0.0s ⠿ Volume plaintext_namenode Removed 0.0s ⠿ Network plaintext_default Removed 0.1s [+] Running 13/13 ⠿ Network plaintext_default Created 0.1s ⠿ Volume "plaintext_datanode" Created 0.0s ⠿ Volume "plaintext_namenode" Created 0.0s ⠿ Container broker Started 3.5s ⠿ Container hive-metastore-postgresql Started 2.4s ⠿ Container zookeeper Started 3.2s ⠿ Container hive-metastore Started 2.8s ⠿ Container presto-coordinator Started 3.5s ⠿ Container hive-server Started 2.8s ⠿ Container namenode Started 2.0s ⠿ Container datanode Started 2.2s ⠿ Container schema-registry Started 4.6s ⠿ Container connect Started 5.5s 19:16:07 ℹ️ 📝 To see the actual properties file, use cli command playground get-properties -c <container> 19:16:07 ℹ️ ✨ If you modify a docker-compose file and want to re-create the container(s), run cli command playground container recreate 19:16:07 ℹ️ ⌛ Waiting up to 480 seconds for connect to start 19:19:08 ℹ️ 🚦 connect is started! 19:19:09 ℹ️ 📊 JMX metrics are available locally on those ports: 19:19:09 ℹ️ - zookeeper : 9999 19:19:09 ℹ️ - broker : 10000 19:19:09 ℹ️ - schema-registry : 10001 19:19:09 ℹ️ - connect : 10002 19:19:21 ℹ️ Creating HDFS Sink connector 19:19:24 ℹ️ 🛠️ Creating connector hdfs-sink 19:19:24 ℹ️ ✅ Connector hdfs-sink was successfully created 19:19:24 ℹ️ 💈 Configuration is { "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector", "flush.size": "3", "hadoop.conf.dir": "/etc/hadoop/", "hive.database": "testhive", "hive.integration": "true", "hive.metastore.uris": "thrift://hive-metastore:9083", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "logs.dir": "/tmp", "partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner", "rotate.interval.ms": "120000", "schema.compatibility": "BACKWARD", "store.url": "hdfs://namenode:8020", "tasks.max": "1", "topics": "test_hdfs", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://schema-registry:8081" } 19:19:25 ℹ️ 🥁 Waiting a few seconds to get new status 19:19:33 ℹ️ 🧩 Displaying connector status for hdfs-sink Name Status Tasks Stack Trace ----------------------------------------------------------------------------------------------------------------------------- hdfs-sink ✅ RUNNING 0:🟢 RUNNING[connect] - ------------------------------------------------------------------------------------------------------------- 19:19:34 ℹ️ Sending messages to topic test_hdfs 19:19:35 ℹ️ 🔮 schema was identified as avro 19:19:35 ℹ️ ✨ generating data... 19:19:35 ℹ️ ☢️ --forced-value is set 19:19:35 ℹ️ ✨ 10 records were generated based on --forced-value (only showing first 10), took: 0min 1sec {"f1":"value1"} {"f1":"value2"} {"f1":"value3"} {"f1":"value4"} {"f1":"value5"} {"f1":"value6"} {"f1":"value7"} {"f1":"value8"} {"f1":"value9"} {"f1":"value10"} 19:19:44 ℹ️ 💯 Get number of records in topic test_hdfs 0 19:19:44 ℹ️ 📤 producing 10 records to topic test_hdfs [2023-09-27 13:49:47,753] WARN MessageReader is deprecated. Please use org.apache.kafka.tools.api.RecordReader instead (kafka.tools.ConsoleProducer$) 19:19:51 ℹ️ 📤 produced 10 records to topic test_hdfs, took: 0min 7sec 19:19:54 ℹ️ 💯 Get number of records in topic test_hdfs 10 19:20:18 ℹ️ Listing content of /topics/test_hdfs/partition=0 in HDFS Found 3 items -rw-r--r-- 3 appuser supergroup 213 2023-09-27 13:49 /topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro -rw-r--r-- 3 appuser supergroup 213 2023-09-27 13:49 /topics/test_hdfs/partition=0/test_hdfs+0+0000000003+0000000005.avro -rw-r--r-- 3 appuser supergroup 213 2023-09-27 13:49 /topics/test_hdfs/partition=0/test_hdfs+0+0000000006+0000000008.avro 19:20:20 ℹ️ Getting one of the avro files locally and displaying content with avro-tools 23/09/27 13:50:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable {"f1":"value1"} {"f1":"value2"} {"f1":"value3"} 19:20:26 ℹ️ Check data with beeline SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Beeline version 2.3.2 by Apache Hive beeline> !connect jdbc:hive2://hive-server:10000/testhive Connecting to jdbc:hive2://hive-server:10000/testhive Enter username for jdbc:hive2://hive-server:10000/testhive: hive Enter password for jdbc:hive2://hive-server:10000/testhive: **** Connected to: Apache Hive (version 2.3.2) Driver: Hive JDBC (version 2.3.2) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://hive-server:10000/testhive> show create table test_hdfs; +----------------------------------------------------+ | createtab_stmt | +----------------------------------------------------+ | CREATE EXTERNAL TABLE `test_hdfs`( | | `f1` string COMMENT '') | | PARTITIONED BY ( | | `partition` string COMMENT '') | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' | | LOCATION | | 'hdfs://namenode:8020/topics/test_hdfs' | | TBLPROPERTIES ( | | 'avro.schema.literal'='{"type":"record","name":"ConnectDefault","namespace":"io.confluent.connect.avro","fields":[{"name":"f1","type":"string"}]}', | | 'transient_lastDdlTime'='1695822593') | +----------------------------------------------------+ 15 rows selected (1.492 seconds) 0: jdbc:hive2://hive-server:10000/testhive> select * from test_hdfs; +---------------+----------------------+ | test_hdfs.f1 | test_hdfs.partition | +---------------+----------------------+ | value1 | 0 | | value2 | 0 | | value3 | 0 | | value4 | 0 | | value5 | 0 | | value6 | 0 | | value7 | 0 | | value8 | 0 | | value9 | 0 | +---------------+----------------------+ 9 rows selected (1.95 seconds) 0: jdbc:hive2://hive-server:10000/testhive> Closing: 0: jdbc:hive2://hive-server:10000/testhive | value1 | 0 | 19:20:32 ℹ️ #################################################### 19:20:33 ℹ️ ✅ RESULT: SUCCESS for hdfs2-sink.sh (took: 23min 35sec - ) 19:20:33 ℹ️ #################################################### 19:20:36 ℹ️ ✨ --connector flag was not provided, applying command to all connectors 19:20:37 ℹ️ 🧩 Displaying connector status for hdfs-sink Name Status Tasks Stack Trace ----------------------------------------------------------------------------------------------------------------------------- hdfs-sink ✅ RUNNING 0:🟢 RUNNING[connect] - ------------------------------------------------------------------------------------------------------------- 19:20:43 ℹ️ 🗯️ Version currently used for confluentinc-kafka-connect-hdfs is not latest 19:20:43 ℹ️ Current "🔢 v10.2.4-SNAPSHOT - 📅 release date: 2023-09-27" 19:20:43 ℹ️ Latest on Hub "🔢 v10.2.3 - 📅 release date: 2023-09-27" 19:20:44 ℹ️ 🌐 documentation is available at:
Problem
CVE fix for https://confluentinc.atlassian.net/browse/CC-20328
Includes code changes to accommodate the transition in the return type of
shouldSchemaChange
method from boolean in v11.1.4 to SchemaCompatibilityResult in v11.2.3 of kafka-connect-storage-commonSolution
Does this solution apply anywhere else?
If yes, where?
Test Strategy
Testing done:
Units Tests
Manual Tests
Release Plan