confluentinc / kafka-connect-hdfs

Kafka Connect HDFS connector
Other
475 stars 397 forks source link

CVE fix #656

Closed naveenmall11 closed 1 year ago

naveenmall11 commented 1 year ago

Problem

https://confluentinc.atlassian.net/browse/CC-18965 https://confluentinc.atlassian.net/browse/CCMSG-2266 https://confluentinc.atlassian.net/browse/CCMSG-2248

Twistlock scan link: https://twistlock.tools.confluent-internal.io/#!/monitor/vulnerabilities/images/ci?search=Confluent%20Public%20Repo%20PR%20builder%2Fkafka-connect-hdfs%2FPR-656

Does this solution apply anywhere else?
If yes, where?

Test Strategy

Testing done:

Release Plan

janjwerner-confluent commented 1 year ago

Is there a reason to go with this scope limited fix instead of #652 ?

venkatteki commented 1 year ago

Is there a reason to go with this scope limited fix instead of #652 ?

@janjwerner-confluent We are prioritizing the CVEs raised on or before 31st Dec'22. After merging this PR, we will look to address the other CVEs.

naveenmall11 commented 1 year ago

LGTM post testing on docker playground.

CONNECTOR_JAR is set with /home/nmall/kafka-connect-hdfs/target/kafka-connect-hdfs-10.1.16-SNAPSHOT.jar
/usr/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib/kafka-connect-hdfs-10.2.0.jar
10:56:49 ℹ️ 👷🎯 Building Docker image vdesabou/kafka-docker-playground-connect:CP-7.3.2-kafka-connect-hdfs-10.1.16-SNAPSHOT.jar
10:56:49 ℹ️ Remplacing kafka-connect-hdfs-10.2.0.jar by kafka-connect-hdfs-10.1.16-SNAPSHOT.jar
[+] Building 0.6s (7/7) FINISHED
 => [internal] load build definition from Dockerfile                                                               0.0s
 => => transferring dockerfile: 244B                                                                               0.0s
 => [internal] load .dockerignore                                                                                  0.0s
 => => transferring context: 2B                                                                                    0.0s
 => [internal] load metadata for docker.io/vdesabou/kafka-docker-playground-connect:7.3.2                          0.0s
 => [internal] load build context                                                                                  0.0s
 => => transferring context: 155.76kB                                                                              0.0s
 => [1/2] FROM docker.io/vdesabou/kafka-docker-playground-connect:7.3.2                                            0.1s
 => [2/2] COPY kafka-connect-hdfs-10.1.16-SNAPSHOT.jar /usr/share/confluent-hub-components/confluentinc-kafka-con  0.3s
 => exporting to image                                                                                             0.1s
 => => exporting layers                                                                                            0.0s
 => => writing image sha256:b58e39145e001fc738ceda2eb915d4f4f7657ef84b6e7a5e5d72461a4205d014                       0.0s
 => => naming to docker.io/vdesabou/kafka-docker-playground-connect:CP-7.3.2-kafka-connect-hdfs-10.1.16-SNAPSHOT.  0.0s
10:56:50 ℹ️ 🛑 Grafana is disabled
10:56:50 ℹ️ 🛑 kcat is disabled
10:56:50 ℹ️ 🛑 conduktor is disabled
[+] Running 16/16
 ⠿ Container hive-metastore-postgresql  Removed                                                                    5.4s
 ⠿ Container datanode                   Removed                                                                   13.9s
 ⠿ Container connect                    Removed                                                                    2.1s
 ⠿ Container namenode                   Removed                                                                   13.9s
 ⠿ Container schema-registry            Removed                                                                    3.0s
 ⠿ Container hive-server                Removed                                                                   11.8s
 ⠿ Container presto-coordinator         Removed                                                                    3.1s
 ⠿ Container hive-metastore             Removed                                                                    3.2s
 ⠿ Container zookeeper                  Removed                                                                    2.6s
 ⠿ Container broker                     Removed                                                                   11.6s
 ⠿ Container control-center             Removed                                                                   14.1s
 ⠿ Container ksqldb-cli                 Removed                                                                    0.8s
 ⠿ Container ksqldb-server              Removed                                                                   14.0s
 ⠿ Volume plaintext_namenode            Removed                                                                    0.0s
 ⠿ Volume plaintext_datanode            Removed                                                                    0.0s
 ⠿ Network plaintext_default            Removed                                                                    0.2s
[+] Running 16/16
 ⠿ Network plaintext_default            Created                                                                    0.0s
 ⠿ Volume "plaintext_datanode"          Created                                                                    0.0s
 ⠿ Volume "plaintext_namenode"          Created                                                                    0.0s
 ⠿ Container broker                     Started                                                                    3.4s
 ⠿ Container zookeeper                  Started                                                                    2.8s
 ⠿ Container namenode                   Started                                                                    3.3s
 ⠿ Container hive-server                Started                                                                    1.9s
 ⠿ Container presto-coordinator         Started                                                                    2.5s
 ⠿ Container hive-metastore-postgresql  Started                                                                    2.8s
 ⠿ Container datanode                   Started                                                                    2.4s
 ⠿ Container hive-metastore             Started                                                                    3.3s
 ⠿ Container schema-registry            Started                                                                    4.2s
 ⠿ Container connect                    Started                                                                    6.3s
 ⠿ Container ksqldb-server              Started                                                                    8.5s
 ⠿ Container control-center             Started                                                                    8.7s
 ⠿ Container ksqldb-cli                 Started                                                                   10.2s
10:57:33 ℹ️ 📝 To see the actual properties file, use cli command playground get-properties -c <container>
10:57:33 ℹ️ ✨ If you modify a docker-compose file and want to re-create the container(s), run cli command playground recreate-container
10:57:33 ℹ️ ⌛ Waiting up to 480 seconds for connect to start
10:59:40 ℹ️ 🚦 connect is started!
10:59:40 ℹ️ 🔧 You can use ksqlDB with CLI using:
10:59:40 ℹ️ docker exec -i ksqldb-cli ksql http://ksqldb-server:8088
10:59:40 ℹ️ 📊 JMX metrics are available locally on those ports:
10:59:40 ℹ️     - zookeeper       : 9999
10:59:40 ℹ️     - broker          : 10000
10:59:40 ℹ️     - schema-registry : 10001
10:59:40 ℹ️     - connect         : 10002
10:59:40 ℹ️     - ksqldb-server   : 10003
11:01:20 ℹ️ Creating HDFS Sink connector
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1710  100   749  100   961     54     69  0:00:13  0:00:13 --:--:--   180
{
  "name": "hdfs-sink",
  "config": {
    "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
    "tasks.max": "1",
    "topics": "test_hdfs",
    "store.url": "hdfs://namenode:8020",
    "flush.size": "3",
    "hadoop.conf.dir": "/etc/hadoop/",
    "partitioner.class": "io.confluent.connect.hdfs.partitioner.FieldPartitioner",
    "partition.field.name": "f1",
    "rotate.interval.ms": "120000",
    "logs.dir": "/tmp",
    "hive.integration": "true",
    "hive.metastore.uris": "thrift://hive-metastore:9083",
    "hive.database": "testhive",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "io.confluent.connect.avro.AvroConverter",
    "value.converter.schema.registry.url": "http://schema-registry:8081",
    "schema.compatibility": "BACKWARD",
    "name": "hdfs-sink"
  },
  "tasks": [],
  "type": "sink"
}
11:02:04 ℹ️ Sending messages to topic test_hdfs
11:02:28 ℹ️ Listing content of /topics/test_hdfs in HDFS
Found 10 items
drwxr-xr-x   - appuser supergroup          0 2023-03-29 05:32 /topics/test_hdfs/f1=value1
drwxrwxrwx   - appuser supergroup          0 2023-03-29 05:32 /topics/test_hdfs/f1=value10
drwxr-xr-x   - appuser supergroup          0 2023-03-29 05:32 /topics/test_hdfs/f1=value2
drwxr-xr-x   - appuser supergroup          0 2023-03-29 05:32 /topics/test_hdfs/f1=value3
drwxr-xr-x   - appuser supergroup          0 2023-03-29 05:32 /topics/test_hdfs/f1=value4
drwxr-xr-x   - appuser supergroup          0 2023-03-29 05:32 /topics/test_hdfs/f1=value5
drwxr-xr-x   - appuser supergroup          0 2023-03-29 05:32 /topics/test_hdfs/f1=value6
drwxr-xr-x   - appuser supergroup          0 2023-03-29 05:32 /topics/test_hdfs/f1=value7
drwxr-xr-x   - appuser supergroup          0 2023-03-29 05:32 /topics/test_hdfs/f1=value8
drwxr-xr-x   - appuser supergroup          0 2023-03-29 05:32 /topics/test_hdfs/f1=value9
11:02:31 ℹ️ Getting one of the avro files locally and displaying content with avro-tools
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
{"f1":"value1"}
11:02:36 ℹ️ Check data with beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.2 by Apache Hive
beeline> !connect jdbc:hive2://hive-server:10000/testhive
Enter username for jdbc:hive2://hive-server:10000/testhive: Connecting to jdbc:hive2://hive-server:10000/testhive
hive
Enter password for jdbc:hive2://hive-server:10000/testhive: ****
Connected to: Apache Hive (version 2.3.2)
Driver: Hive JDBC (version 2.3.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hive-server:10000/testhive> show create table test_hdfs;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `test_hdfs`(                 |
| )                                                  |
| PARTITIONED BY (                                   |
|   `f1` string COMMENT '')                          |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'   |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' |
| LOCATION                                           |
|   'hdfs://namenode:8020/topics/test_hdfs'          |
| TBLPROPERTIES (                                    |
|   'avro.schema.literal'='{"type":"record","name":"ConnectDefault","namespace":"io.confluent.connect.avro","fields":[]}',  |
|   'transient_lastDdlTime'='1680067945')            |
+----------------------------------------------------+
15 rows selected (2.135 seconds)
0: jdbc:hive2://hive-server:10000/testhive> select * from test_hdfs;
+---------------+
| test_hdfs.f1  |
+---------------+
| value1        |
| value2        |
| value3        |
| value4        |
| value5        |
| value6        |
| value7        |
| value8        |
| value9        |
+---------------+
9 rows selected (2.944 seconds)
0: jdbc:hive2://hive-server:10000/testhive> Closing: 0: jdbc:hive2://hive-server:10000/testhive
| value1        |