confluentinc / kafka-connect-hdfs

Kafka Connect HDFS connector
Other
475 stars 397 forks source link

[CC-22217, CC-22575] CVE Fixes #672

Closed khsoneji closed 7 months ago

khsoneji commented 7 months ago

Problem

CVE fixes: https://confluentinc.atlassian.net/browse/CC-22575 https://confluentinc.atlassian.net/browse/CC-22217

Solution

Updated version of ivy and snappy-java to version without cves.

Does this solution apply anywhere else?
If yes, where?

Test Strategy

CVE is not showing in the PR scan: https://twistlock.tools.confluent-internal.io/#!/monitor/vulnerabilities/images/ci?search=Confluent%20Public%20Repo%20PR%20builder%2Fkafka-connect-hdfs%2FPR-672

Dependency tree, the version is updated

ksoneji@T9X6X34XJG kafka-connect-hdfs % mvn dependency:tree | grep -e snappy -e ivy
[INFO] |  |  \- org.xerial.snappy:snappy-java:jar:1.1.10.4:compile
[INFO] |  |  +- org.apache.ivy:ivy:jar:2.5.2:compile

Unit tests:

[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 151, Failures: 0, Errors: 0, Skipped: 0
[INFO] 

Docker playground

Docker playground test :point_down::skin-tone-3: 
~/gitrepos/kafka-docker-playground/connect/connect-hdfs2-sink master wip -------------------------------------------------------------------------------------------------------------------------------------------------------- 3s 10:43:50
> playground run -f hdfs2-sink.sh --connector-zip ~/gitrepos/kafka-connect-hdfs/target/components/packages/confluentinc-kafka-connect-hdfs-10.2.5-SNAPSHOT.zip
10:45:13 ℹ️ 🚀 Running example with flags
10:45:13 ℹ️ ⛳ Flags used are  --connector-zip=/Users/vbalani/gitrepos/kafka-connect-hdfs/target/components/packages/confluentinc-kafka-connect-hdfs-10.2.5-SNAPSHOT.zip
10:45:14 ℹ️ 💀 Kill all docker containers
10:45:14 ℹ️ ####################################################
10:45:14 ℹ️ 🚀 Executing hdfs2-sink.sh in dir .
10:45:14 ℹ️ ####################################################
10:45:14 ℹ️ 💫 Using default CP version 7.5.0
10:45:14 ℹ️ 🎓 Use --tag option to specify different version, see https://kafka-docker-playground.io/#/how-to-use?id=🎯-for-confluent-platform-cp
10:45:14 ℹ️ 🎯🤐 CONNECTOR_ZIP (--connector-zip option) is set with /Users/vbalani/gitrepos/kafka-connect-hdfs/target/components/packages/confluentinc-kafka-connect-hdfs-10.2.5-SNAPSHOT.zip
10:45:15 ℹ️ 🧰 Checking if Docker image confluentinc/cp-server-connect-base:7.5.0 contains additional tools
10:45:15 ℹ️ 🧰 it can take a while if image is downloaded for the first time
10:45:15 ℹ️ 🎱 Installing connector from zip confluentinc-kafka-connect-hdfs-10.2.5-SNAPSHOT.zip
Running in a "--no-prompt" mode
Implicit acceptance of the license below:
Confluent Community License
http://www.confluent.io/confluent-community-license
Installing a component Kafka Connect HDFS 10.2.5-SNAPSHOT, provided by Confluent, Inc. from the local file: /tmp/confluentinc-kafka-connect-hdfs-10.2.5-SNAPSHOT.zip into directory: /usr/share/confluent-hub-components
Adding installation directory to plugin path in the following files:
  /etc/kafka/connect-distributed.properties
  /etc/kafka/connect-standalone.properties
  /etc/schema-registry/connect-avro-distributed.properties
  /etc/schema-registry/connect-avro-standalone.properties
Completed
10:47:50 ℹ️ Getting hive-jdbc-3.1.2-standalone.jar
--2023-11-16 10:47:51--  https://repo1.maven.org/maven2/org/apache/hive/hive-jdbc/3.1.2/hive-jdbc-3.1.2-standalone.jar
Resolving repo1.maven.org (repo1.maven.org)... 199.232.192.209, 199.232.196.209, 2a04:4e42:4c::209, ...
Connecting to repo1.maven.org (repo1.maven.org)|199.232.192.209|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 72420147 (69M) [application/java-archive]
Saving to: 'hive-jdbc-3.1.2-standalone.jar'
hive-jdbc-3.1.2-standalone.jar                              100%[=========================================================================================================================================>]  69.06M  17.8MB/s    in 4.7s
2023-11-16 10:47:58 (14.6 MB/s) - 'hive-jdbc-3.1.2-standalone.jar' saved [72420147/72420147]
10:47:59 ℹ️ 🛑 control-center is disabled
10:47:59 ℹ️ 🛑 ksqldb is disabled
10:47:59 ℹ️ 🛑 Grafana is disabled
10:47:59 ℹ️ 🛑 kcat is disabled
10:47:59 ℹ️ 🛑 conduktor is disabled
[+] Building 0.0s (0/0)                                                                                                                                                                                                  docker:desktop-linux
[+] Running 3/0
 ✔ Volume plaintext_datanode  Removed                                                                                                                                                                                                    0.0s
 ✔ Volume plaintext_namenode  Removed                                                                                                                                                                                                    0.0s
 ✔ Network plaintext_default  Removed                                                                                                                                                                                                    0.1s
[+] Building 0.0s (0/0)                                                                                                                                                                                                  docker:desktop-linux
[+] Running 13/13
 ✔ Network plaintext_default            Created                                                                                                                                                                                          0.0s
 ✔ Volume "plaintext_namenode"          Created                                                                                                                                                                                          0.0s
 ✔ Volume "plaintext_datanode"          Created                                                                                                                                                                                          0.0s
 ✔ Container hive-server                Started                                                                                                                                                                                          0.1s
 ✔ Container zookeeper                  Started                                                                                                                                                                                          0.1s
 ✔ Container namenode                   Started                                                                                                                                                                                          0.1s
 ✔ Container datanode                   Started                                                                                                                                                                                          0.1s
 ✔ Container broker                     Started                                                                                                                                                                                          0.1s
 ✔ Container hive-metastore-postgresql  Started                                                                                                                                                                                          0.1s
 ✔ Container hive-metastore             Started                                                                                                                                                                                          0.1s
 ✔ Container presto-coordinator         Started                                                                                                                                                                                          0.1s
 ✔ Container schema-registry            Started                                                                                                                                                                                          0.1s
 ✔ Container connect                    Started                                                                                                                                                                                          0.0s
10:48:03 ℹ️ 📝 To see the actual properties file, use cli command playground get-properties -c <container>
10:48:03 ℹ️ ✨ If you modify a docker-compose file and want to re-create the container(s), run cli command playground container recreate
10:48:03 ℹ️ ⌛ Waiting up to 480 seconds for connect to start
10:49:27 ℹ️ 🚦 connect is started!
10:49:27 ℹ️ 📊 JMX metrics are available locally on those ports:
10:49:27 ℹ️     - zookeeper       : 9999
10:49:27 ℹ️     - broker          : 10000
10:49:27 ℹ️     - schema-registry : 10001
10:49:27 ℹ️     - connect         : 10002
10:49:39 ℹ️ Creating HDFS Sink connector
10:49:41 ℹ️ 🛠️ Creating connector hdfs-sink
10:49:41 ℹ️ ✅ Connector hdfs-sink was successfully created
10:49:41 ℹ️ 💈 Configuration is
{
  "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
  "flush.size": "3",
  "hadoop.conf.dir": "/etc/hadoop/",
  "hive.database": "testhive",
  "hive.integration": "true",
  "hive.metastore.uris": "thrift://hive-metastore:9083",
  "key.converter": "org.apache.kafka.connect.storage.StringConverter",
  "logs.dir": "/tmp",
  "partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner",
  "rotate.interval.ms": "120000",
  "schema.compatibility": "BACKWARD",
  "store.url": "hdfs://namenode:8020",
  "tasks.max": "1",
  "topics": "test_hdfs",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "value.converter.schema.registry.url": "http://schema-registry:8081"
}
10:49:41 ℹ️ 🥁 Waiting a few seconds to get new status
10:49:49 ℹ️ 🧩 Displaying connector status for hdfs-sink
Name                           Status       Tasks                                                        Stack Trace
-----------------------------------------------------------------------------------------------------------------------------
hdfs-sink                      ✅ RUNNING  0:🟢 RUNNING[connect]        -
-------------------------------------------------------------------------------------------------------------
10:49:49 ℹ️ Sending messages to topic test_hdfs
10:49:50 ℹ️ 🔮 schema was identified as avro
10:49:50 ℹ️ ✨ generating data...
10:49:50 ℹ️ ☢️ --forced-value is set
10:49:50 ℹ️ ✨ 10 records were generated based on --forced-value  (only showing first 10), took: 0min 1sec
{"f1":"value1"}
{"f1":"value2"}
{"f1":"value3"}
{"f1":"value4"}
{"f1":"value5"}
{"f1":"value6"}
{"f1":"value7"}
{"f1":"value8"}
{"f1":"value9"}
{"f1":"value10"}
10:49:55 ℹ️ 💯 Get number of records in topic test_hdfs
0
10:49:55 ℹ️ 📤 producing 10 records to topic test_hdfs
[2023-11-16 05:19:57,425] WARN MessageReader is deprecated. Please use org.apache.kafka.tools.api.RecordReader instead (kafka.tools.ConsoleProducer$)
10:49:58 ℹ️ 📤 produced 10 records to topic test_hdfs, took: 0min 3sec
10:49:59 ℹ️ 💯 Get number of records in topic test_hdfs
10
10:50:13 ℹ️ Listing content of /topics/test_hdfs/partition=0 in HDFS
Found 3 items
-rw-r--r--   3 appuser supergroup        213 2023-11-16 05:19 /topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro
-rw-r--r--   3 appuser supergroup        213 2023-11-16 05:19 /topics/test_hdfs/partition=0/test_hdfs+0+0000000003+0000000005.avro
-rw-r--r--   3 appuser supergroup        213 2023-11-16 05:19 /topics/test_hdfs/partition=0/test_hdfs+0+0000000006+0000000008.avro
10:50:15 ℹ️ Getting one of the avro files locally and displaying content with avro-tools
Successfully copied 2.05kB to /tmp/
23/11/16 05:20:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
{"f1":"value1"}
{"f1":"value2"}
{"f1":"value3"}
10:50:19 ℹ️ Check data with beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.2 by Apache Hive
beeline> !connect jdbc:hive2://hive-server:10000/testhive
Enter username for jdbc:hive2://hive-server:10000/testhive: hConnecting to jdbc:hive2://hive-server:10000/testhive
ive
Enter password for jdbc:hive2://hive-server:10000/testhive: ****
Connected to: Apache Hive (version 2.3.2)
Driver: Hive JDBC (version 2.3.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hive-server:10000/testhive> show create table test_hdfs;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `test_hdfs`(                 |
|   `f1` string COMMENT '')                          |
| PARTITIONED BY (                                   |
|   `partition` string COMMENT '')                   |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'   |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' |
| LOCATION                                           |
|   'hdfs://namenode:8020/topics/test_hdfs'          |
| TBLPROPERTIES (                                    |
|   'avro.schema.literal'='{"type":"record","name":"ConnectDefault","namespace":"io.confluent.connect.avro","fields":[{"name":"f1","type":"string"}]}',  |
|   'transient_lastDdlTime'='1700111999')            |
+----------------------------------------------------+
15 rows selected (1.023 seconds)
0: jdbc:hive2://hive-server:10000/testhive> select * from test_hdfs;
+---------------+----------------------+
| test_hdfs.f1  | test_hdfs.partition  |
+---------------+----------------------+
| value1        | 0                    |
| value2        | 0                    |
| value3        | 0                    |
| value4        | 0                    |
| value5        | 0                    |
| value6        | 0                    |
| value7        | 0                    |
| value8        | 0                    |
| value9        | 0                    |
+---------------+----------------------+
9 rows selected (1.471 seconds)
0: jdbc:hive2://hive-server:10000/testhive> Closing: 0: jdbc:hive2://hive-server:10000/testhive
| value1        | 0                    |
10:50:23 ℹ️ ####################################################
10:50:23 ℹ️ ✅ RESULT: SUCCESS for hdfs2-sink.sh (took: 5min 9sec - )
10:50:23 ℹ️ ####################################################
10:50:27 ℹ️ ✨ --connector flag was not provided, applying command to all connectors
10:50:27 ℹ️ 🧩 Displaying connector status for hdfs-sink
Name                           Status       Tasks                                                        Stack Trace
-----------------------------------------------------------------------------------------------------------------------------
hdfs-sink                      ✅ RUNNING  0:🟢 RUNNING[connect]        -
-------------------------------------------------------------------------------------------------------------
10:50:32 ℹ️ 🗯️ Version currently used for confluentinc-kafka-connect-hdfs is not latest
10:50:32 ℹ️ Current
"🔢 v10.2.5-SNAPSHOT - 📅 release date: 2023-11-16"
10:50:32 ℹ️ Latest on Hub
"🔢 v10.2.4 - 📅 release date: 2023-09-28"
10:50:32 ℹ️ 🌐 documentation is available at:
Testing done:

Release Plan

cla-assistant[bot] commented 7 months ago

CLA assistant check
All committers have signed the CLA.