confluentinc / kafka-connect-hdfs

Kafka Connect HDFS connector
Other
12 stars 396 forks source link

CC-29302, CC-29425: CVE Fixes for `commons-io` & `protobuf-java` #704

Closed vbalani002 closed 1 month ago

vbalani002 commented 1 month ago

CVE Tickets: https://confluentinc.atlassian.net/browse/CC-29302 : CVE-2024-7254 https://confluentinc.atlassian.net/browse/CC-29425 : CVE-2024-47554

> mvn dependency:tree -Daether.dependencyCollector.impl=bf -Dmaven.artifact.threads=100   | grep -e commons-io -e protobuf-java
[INFO] |  +- com.google.protobuf:protobuf-java:jar:3.25.5:compile
[INFO] |  +- commons-io:commons-io:jar:2.14.0:compile

Docker playground tests

> cd ~/gitrepos/kafka-docker-playground/connect/connect-hdfs2-sink
> playground run -f hdfs2-sink.sh --connector-zip ~/gitrepos/airlock-kafka-connect-hdfs/target/components/packages/confluentinc-kafka-connect-hdfs-10.2.11-SNAPSHOT.zip
16:40:44 ℹ️ 🚀 Running example with flags
16:40:44 ℹ️ ⛳ Flags used are --connector-zip=/Users/vbalani/gitrepos/airlock-kafka-connect-hdfs/target/components/packages/confluentinc-kafka-connect-hdfs-10.2.11-SNAPSHOT.zip
16:40:45 ℹ️ 💀 Kill all docker containers
16:40:47 ℹ️ 📋 command to run again example has been copied to the clipboard (disable with 'playground config set clipboard false')
16:40:49 ℹ️ 🚀 Number of examples ran so far: 43
16:40:49 ℹ️ ####################################################
16:40:49 ℹ️ 🚀 Executing hdfs2-sink.sh in dir .
16:40:49 ℹ️ ####################################################
16:40:50 ℹ️ 💫 Using default CP version 7.6.1
16:40:50 ℹ️ 🎓 Use --tag option to specify different version, see https://kafka-docker-playground.io/#/how-to-use?id=🎯-for-confluent-platform-cp
16:40:50 ℹ️ 🎯🤐 CONNECTOR_ZIP (--connector-zip option) is set with /Users/vbalani/gitrepos/airlock-kafka-connect-hdfs/target/components/packages/confluentinc-kafka-connect-hdfs-10.2.11-SNAPSHOT.zip
16:40:50 ℹ️ 🧰 Checking if Docker image confluentinc/cp-server-connect-base:7.6.1 contains additional tools
16:40:50 ℹ️ ⏳ it can take a while if image is downloaded for the first time
16:40:51 ℹ️ 🎱 Installing connector from zip confluentinc-kafka-connect-hdfs-10.2.11-SNAPSHOT.zip
Installing a component Kafka Connect HDFS 10.2.11-SNAPSHOT, provided by Confluent, Inc. from the local file: /tmp/confluentinc-kafka-connect-hdfs-10.2.11-SNAPSHOT.zip into directory: /usr/share/confluent-hub-components
16:45:46 ℹ️ 💀 Kill all docker containers
16:45:52 ❗ 🥶 The current repo version is older than 3 days (133 days), please refresh your version using git pull !
Continue (y/n)?y
16:46:22 ℹ️ 🛑 control-center is disabled
16:46:22 ℹ️ 🛑 ksqldb is disabled
16:46:23 ℹ️ 🛑 REST Proxy is disabled
16:46:23 ℹ️ 🛑 Grafana is disabled
16:46:24 ℹ️ 🛑 kcat is disabled
16:46:24 ℹ️ 🛑 conduktor is disabled
[+] Building 0.0s (0/0)                                                                                                                                               docker:desktop-linux
[+] Running 3/0
 ✔ Volume plaintext_namenode  Removed                                                                                                                                                 0.0s
 ✔ Volume plaintext_datanode  Removed                                                                                                                                                 0.0s
 ✔ Network plaintext_default  Removed                                                                                                                                                 0.1s
[+] Building 0.0s (0/0)                                                                                                                                               docker:desktop-linux
[+] Running 13/13
 ✔ Network plaintext_default            Created                                                                                                                                       0.1s
 ✔ Volume "plaintext_datanode"          Created                                                                                                                                       0.0s
 ✔ Volume "plaintext_namenode"          Created                                                                                                                                       0.0s
 ✔ Container zookeeper                  Started                                                                                                                                       0.1s
 ✔ Container broker                     Started                                                                                                                                       0.1s
 ✔ Container datanode                   Started                                                                                                                                       0.1s
 ✔ Container hive-server                Started                                                                                                                                       0.1s
 ✔ Container hive-metastore-postgresql  Started                                                                                                                                       0.1s
 ✔ Container hive-metastore             Started                                                                                                                                       0.1s
 ✔ Container presto-coordinator         Started                                                                                                                                       0.1s
 ✔ Container namenode                   Started                                                                                                                                       0.1s
 ✔ Container schema-registry            Started                                                                                                                                       0.1s
 ✔ Container connect                    Started                                                                                                                                       0.1s
16:46:30 ℹ️ 📝 To see the actual properties file, use cli command playground container get-properties -c <container>
16:46:31 ℹ️ ✨ If you modify a docker-compose file and want to re-create the container(s), run cli command playground container recreate
16:46:31 ℹ️ ⌛ Waiting up to 300 seconds for connect to start
[2024-10-04 11:17:29,316] INFO [Worker clientId=connect-adminclient-producer, groupId=connect-cluster] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1873)
16:47:33 ℹ️ 🚦 containers have started!
16:47:33 ℹ️ 📊 JMX metrics are available locally on those ports:
16:47:33 ℹ️     - zookeeper       : 9999
16:47:33 ℹ️     - broker          : 10000
16:47:33 ℹ️     - schema-registry : 10001
16:47:33 ℹ️     - connect         : 10002
16:47:46 ℹ️ Creating HDFS Sink connector
16:47:51 ℹ️ 🛠️ Creating 🌎onprem connector hdfs-sink
16:47:51 ℹ️ 📋 🌎onprem connector config has been copied to the clipboard (disable with 'playground config set clipboard false')
16:47:52 ℹ️ ✅ 🌎onprem connector hdfs-sink was successfully created
16:47:53 ℹ️ 🧰 Current config for 🌎onprem connector hdfs-sink (using REST API /config endpoint)
playground connector create-or-update --connector hdfs-sink --no-clipboard << EOF
{
  "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
  "flush.size": "3",
  "hadoop.conf.dir": "/etc/hadoop/",
  "hive.database": "testhive",
  "hive.integration": "true",
  "hive.metastore.uris": "thrift://hive-metastore:9083",
  "key.converter": "org.apache.kafka.connect.storage.StringConverter",
  "logs.dir": "/tmp",
  "name": "hdfs-sink",
  "partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner",
  "rotate.interval.ms": "120000",
  "schema.compatibility": "BACKWARD",
  "store.url": "hdfs://namenode:8020",
  "tasks.max": "1",
  "topics": "test_hdfs",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "value.converter.schema.registry.url": "http://schema-registry:8081"
}
EOF
16:47:58 ℹ️ 🔩 list of all available parameters for 🌎onprem connector hdfs-sink (org.apache.kafka.connect.mirror.MirrorSourceConnector) and version 7.6.1-ce (with default value when applicable)
    "allow.optional.map.keys": "false",
    "avro.codec": "",
    "connect.hdfs.keytab": "STRING",
    "connect.hdfs.principal": "STRING",
    "connect.meta.data": "true",
    "directory.delim": "/",
    "enhanced.avro.schema.support": "true",
    "file.delim": "+",
    "filename.offset.zero.pad.width": "10",
    "flush.size": "",
    "format.class": "io.confluent.connect.hdfs.avro.AvroFormat",
    "hadoop.conf.dir": "STRING",
    "hadoop.home": "STRING",
    "hdfs.authentication.kerberos": "false",
    "hdfs.namenode.principal": "STRING",
    "hdfs.url": "",
    "hive.conf.dir": "STRING",
    "hive.database": "default",
    "hive.home": "STRING",
    "hive.integration": "false",
    "hive.metastore.uris": "STRING",
    "hive.table.name": "${topic}",
    "kerberos.ticket.renew.period.ms": "3600000",
    "locale": "STRING",
    "logs.dir": "logs",
    "partition.duration.ms": "-1",
    "partition.field.name": "LIST",
    "partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner",
    "path.format": "STRING",
    "retry.backoff.ms": "5000",
    "rotate.interval.ms": "-1",
    "rotate.schedule.interval.ms": "-1",
    "schema.compatibility": "NONE",
    "schemas.cache.config": "1000",
    "shutdown.timeout.ms": "3000",
    "storage.class": "io.confluent.connect.hdfs.storage.HdfsStorage",
    "store.url": "",
    "timestamp.extractor": "Wallclock",
    "timestamp.field": "timestamp",
    "timezone": "STRING",
    "topic.capture.groups.regex": "",
    "topics.dir": "topics",
16:47:58 ℹ️ 🥁 Waiting a few seconds to get new status
16:48:04 ℹ️ 🧩 Displaying status for 🌎onprem connector hdfs-sink
Name                           Status       Tasks                                                        Stack Trace
-------------------------------------------------------------------------------------------------------------
hdfs-sink                      ✅ RUNNING  0:🟢 RUNNING[connect]        -
-------------------------------------------------------------------------------------------------------------
16:48:06 ℹ️ 🌐 documentation for 🌎onprem connector kafka-connect-hdfs is available at:
https://docs.confluent.io/kafka-connect-hdfs/current/index.html
16:48:07 ℹ️ Sending messages to topic test_hdfs
16:48:09 ℹ️ 🔮 value schema was identified as avro
16:48:09 ℹ️ ✨ generating value data...
16:48:09 ℹ️ ☢️ --forced-value is set
16:48:09 ℹ️ ✨ 10 records were generated based on --forced-value  (only showing first 10), took: 0min 0sec
{"f1":"value1"}
{"f1":"value2"}
{"f1":"value3"}
{"f1":"value4"}
{"f1":"value5"}
{"f1":"value6"}
{"f1":"value7"}
{"f1":"value8"}
{"f1":"value9"}
{"f1":"value10"}
16:48:14 ℹ️ 📤 producing 10 records to topic test_hdfs
16:48:18 ℹ️ 📤 produced 10 records to topic test_hdfs, took: 0min 4sec
16:48:28 ℹ️ Listing content of /topics/test_hdfs/partition=0 in HDFS
Found 3 items
-rw-r--r--   3 appuser supergroup        213 2024-10-04 11:18 /topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro
-rw-r--r--   3 appuser supergroup        213 2024-10-04 11:18 /topics/test_hdfs/partition=0/test_hdfs+0+0000000003+0000000005.avro
-rw-r--r--   3 appuser supergroup        213 2024-10-04 11:18 /topics/test_hdfs/partition=0/test_hdfs+0+0000000006+0000000008.avro
16:48:31 ℹ️ Getting one of the avro files locally and displaying content with avro-tools
Successfully copied 2.05kB to /tmp/
{"f1":"value1"}
{"f1":"value2"}
{"f1":"value3"}
16:48:35 ℹ️ Check data with beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.2 by Apache Hive
beeline> !connect jdbc:hive2://hive-server:10000/testhive
Enter username for jdbc:hive2://hive-server:10000/testhive: Connecting to jdbc:hive2://hive-server:10000/testhive
hive
Enter password for jdbc:hive2://hive-server:10000/testhive: ****
Connected to: Apache Hive (version 2.3.2)
Driver: Hive JDBC (version 2.3.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hive-server:10000/testhive> show create table test_hdfs;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `test_hdfs`(                 |
|   `f1` string COMMENT '')                          |
| PARTITIONED BY (                                   |
|   `partition` string COMMENT '')                   |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'   |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' |
| LOCATION                                           |
|   'hdfs://namenode:8020/topics/test_hdfs'          |
| TBLPROPERTIES (                                    |
|   'avro.schema.literal'='{"type":"record","name":"ConnectDefault","namespace":"io.confluent.connect.avro","fields":[{"name":"f1","type":"string"}]}',  |
|   'transient_lastDdlTime'='1728040699')            |
+----------------------------------------------------+
15 rows selected (1.455 seconds)
0: jdbc:hive2://hive-server:10000/testhive> select * from test_hdfs;
+---------------+----------------------+
| test_hdfs.f1  | test_hdfs.partition  |
+---------------+----------------------+
| value1        | 0                    |
| value2        | 0                    |
| value3        | 0                    |
| value4        | 0                    |
| value5        | 0                    |
| value6        | 0                    |
| value7        | 0                    |
| value8        | 0                    |
| value9        | 0                    |
+---------------+----------------------+
9 rows selected (2.116 seconds)
0: jdbc:hive2://hive-server:10000/testhive> Closing: 0: jdbc:hive2://hive-server:10000/testhive
| value1        | 0                    |
16:48:42 ℹ️ ####################################################
16:48:42 ℹ️ ✅ RESULT: SUCCESS for hdfs2-sink.sh (took: 7min 53sec - )
16:48:42 ℹ️ ####################################################

16:48:48 ℹ️ 🧩 Displaying status for 🌎onprem connector hdfs-sink
Name                           Status       Tasks                                                        Stack Trace
-------------------------------------------------------------------------------------------------------------
hdfs-sink                      ✅ RUNNING  0:🟢 RUNNING[connect]        -
-------------------------------------------------------------------------------------------------------------
16:48:50 ℹ️ 🌐 documentation is available at:
https://docs.confluent.io/current/connect/kafka-connect-hdfs/index.html
16:48:52 ℹ️ 🎯 Version currently used for confluent platform

Test Strategy

Testing done:

Release Plan

confluent-cla-assistant[bot] commented 1 month ago

:tada: All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

sonarqube-confluent[bot] commented 1 month ago

Passed

Analysis Details

0 Issues

Coverage and Duplications

Project ID: kafka-connect-hdfs

View in SonarQube