apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.2k stars 1.21k forks source link

update-dependencies #8991

Closed gyokketto closed 2 months ago

npawar commented 1 year ago

@KKcorps @xiangfu0 PTAL

Jackie-Jiang commented 1 year ago

Any specific reason why we want to update these dependencies? How do we ensure there is no dependency conflict updating so many of them in one shot? There is a on-going work trying to update helix to 1.0.4 in #8325 which is not trivial itself.

gyokketto commented 1 year ago

Hi @Jackie-Jiang ,

Any specific reason why we want to update these dependencies?

We have lots of (> 100) vulnerabilities in old packages. Here is an excerpt from a twistlock scan on the latest image release (only JAR-s, letting alone the OS ones):

low org.eclipse.jetty_jetty-http version 9.3.24.v20180605 has 1 vulnerability medi com.google.guava_guava version 20.0 has 2 vulnerabilities medi com.google.guava_guava version 14.0.1 has 2 vulnerabilities mode io.netty_netty-codec-http version 4.1.54.Final has 3 vulnerabilities mode io.netty_netty-codec-http2 version 4.1.54.Final has 2 vulnerabilities medi org.eclipse.jetty_jetty-io version 9.3.24.v20180605 has 3 vulnerabilities medi org.eclipse.jetty_jetty-servlet version 9.3.24.v20180605 has 1 vulnerability mode org.eclipse.jetty_jetty-servlets version 9.3.24.v20180605 has 1 vulnerability mode org.glassfish.jersey.core_jersey-common version 2.28 has 1 vulnerability high com.fasterxml.jackson.core_jackson-databind version 2.10.0 has 3 vulnerabilities high com.google.oauth-client_google-oauth-client version 1.31.0 has 1 vulnerability high com.google.protobuf_protobuf-java version 3.12.0 has 1 vulnerability high com.google.protobuf_protobuf-java version 3.11.4 has 1 vulnerability high io.netty_netty-all version 4.1.54.Final has 7 vulnerabilities high io.netty_netty-codec version 4.1.54.Final has 7 vulnerabilities high org.apache.zookeeper_zookeeper version 3.5.8 has 1 vulnerability high org.eclipse.jetty_jetty-server version 9.3.24.v20180605 has 6 vulnerabilities high org.yaml_snakeyaml version 1.16 has 1 vulnerability crit com.fasterxml.jackson.core_jackson-databind version 2.9.10 has 40 vulnerabilities crit com.fasterxml.jackson.core_jackson-databind version 2.4.0 has 4 vulnerabilities crit io.netty_netty version 3.9.6.Final has 10 vulnerabilities crit log4j_log4j version 1.2.17 has 6 vulnerabilities crit org.apache.hadoop_hadoop-common version 2.7.0 has 6 vulnerabilities crit org.apache.hadoop_hadoop-hdfs version 2.7.0 has 10 vulnerabilities crit org.apache.spark_spark-core_2.11 version 2.4.0 has 1 vulnerability

With all those vulnerabilities we are not allowed to push our pinot deployment to higher environments.

How do we ensure there is no dependency conflict updating so many of them in one shot?

Yes, that is concerning, but I think we have to trust the test coverage we have currently to detect package upgrade related failures.

There is a on-going work trying to update helix to 1.0.4 in #8325 which is not trivial itself.

Yes, it is far from trivial. I really appreciate your efforts especially after I started working on it and seeing those difficulties myself.

gyokketto commented 1 year ago

These are the test failures currently:

walterddr commented 1 year ago

I've created https://github.com/apache/pinot/pull/9038 and https://github.com/apache/pinot/pull/9046 separately to bump jackson and netty and fixed some test issues. plz kindly take a look. we will run the scan again and see if any other packages we need to upgrade.

for the rest we probably won't do major release updates like hadoop 2->3 or helix 0.x -> 1.x

gyokketto commented 1 year ago

I've created #9038 and #9046 separately to bump jackson and netty and fixed some test issues. plz kindly take a look. we will run the scan again and see if any other packages we need to upgrade.

for the rest we probably won't do major release updates like hadoop 2->3 or helix 0.x -> 1.x

It would be interesting to see a security scan result. Do you think you could build a docker image containing changes from both pr-s that I could scan? If not I might build one myself, the only thing is that the build is taking so much time...

siddharthteotia commented 1 year ago

So what is the conclusion here with this potentially conflicting with ongoing helix upgrade. Are we going to hold off massive dependency upgrades until helix upgrade finishes ?

cc @jackjlli

With all those vulnerabilities we are not allowed to push our pinot deployment to higher environments.

@gyokketto - Just want to clarify what this means. Does it mean we can't do a new Apache release until all of the problems called out in the above comment are fixed ?

gyokketto commented 1 year ago

So what is the conclusion here with this potentially conflicting with ongoing helix upgrade. Are we going to hold off massive dependency upgrades until helix upgrade finishes ?

cc @jackjlli

With all those vulnerabilities we are not allowed to push our pinot deployment to higher environments.

@gyokketto - Just want to clarify what this means. Does it mean we can't do a new Apache release until all of the problems called out in the above comment are fixed ?

@siddharthteotia Sorry, I do not know about pinot's release priorities. We would like to see as many vulnerabilities eliminated as possible, but I can't judge what should be in the next release.

I do not think this draft will ever be promoted to a pr, as many suggested breaking it down to smaller ones. The helix upgrade is a big one and it would probably make sense to have that before other changes.

gyokketto commented 1 year ago

The last commit (666f6c4589c21316471ed3660c36ee1f74c314b4) fixes the avro related issues in tests (except the one in AvroRecordToPinotRowGeneratorTest.testIncomingTimeColumn). Vast majority of test are passing now. I am not sure why the checks/tests are not running after the changes are pushed to the server. Could you please help me with running them?

walterddr commented 1 year ago

The last commit (666f6c4) fixes the avro related issues in tests (except the one in AvroRecordToPinotRowGeneratorTest.testIncomingTimeColumn). Vast majority of test are passing now. I am not sure why the checks/tests are not running after the changes are pushed to the server. Could you please help me with running them?

Hi. Please see #9038 and #9046 that just got merged. you might want to create a separate PR on top of latest master to run your CI jobs b/c this one has major conflict with master branch

FYI: GHA default merges PR to master branch then executes CI actions, this means any non-resolved conflict won't trigger new jobs.

gyokketto commented 1 year ago

The last commit (666f6c4) fixes the avro related issues in tests (except the one in AvroRecordToPinotRowGeneratorTest.testIncomingTimeColumn). Vast majority of test are passing now. I am not sure why the checks/tests are not running after the changes are pushed to the server. Could you please help me with running them?

Hi. Please see #9038 and #9046 that just got merged. you might want to create a separate PR on top of latest master to run your CI jobs b/c this one has major conflict with master branch

FYI: GHA default merges PR to master branch then executes CI actions, this means any non-resolved conflict won't trigger new jobs.

Thank you. I will resolve the conflicts then.

gyokketto commented 1 year ago

Scan on the latest published snapshot (apachepinot/pinot:0.11.0-SNAPSHOT-438c53b-20220715):

low commons-codec_commons-codec version 1.11    has 1 vulnerability
low org.eclipse.jetty_jetty-http version 9.4.45.v20220203   has 1 vulnerability
medi    com.google.guava_guava version 20.0 has 2 vulnerabilities
medi    io.netty_netty-all version 4.1.74.Final has 1 vulnerability
medi    io.netty_netty-codec version 4.1.74.Final   has 1 vulnerability
mode    io.netty_netty-codec-http version 4.1.74.Final  has 1 vulnerability
high    com.google.oauth-client_google-oauth-client version 1.31.0  has 1 vulnerability
high    com.google.protobuf_protobuf-java version 2.4.1 has 1 vulnerability
high    com.google.protobuf_protobuf-java version 3.11.4    has 1 vulnerability
high    org.apache.zookeeper_zookeeper version 3.5.8    has 1 vulnerability
high    org.yaml_snakeyaml version 1.16 has 1 vulnerability
crit    com.fasterxml.jackson.core_jackson-databind version 2.4.0   has 4 vulnerabilities
crit    com.fasterxml.jackson.core_jackson-databind version 2.9.10  has 40 vulnerabilities
crit    io.netty_netty version 3.10.6.Final has 10 vulnerabilities
crit    log4j_log4j version 1.2.17  has 6 vulnerabilities
crit    net.minidev_json-smart version 2.3  has 1 vulnerability
crit    org.apache.hadoop_hadoop-common version 2.10.1  has 2 vulnerabilities
crit    org.apache.hadoop_hadoop-hdfs version 2.10.1    has 2 vulnerabilities
walterddr commented 1 year ago

the commit hash https://github.com/apache/pinot/commit/438c53b was on May 12. would you be able to share exactly how we can generate this report from a docker image or dist-JAR?

gyokketto commented 1 year ago

the commit hash 438c53b was on May 12. would you be able to share exactly how we can generate this report from a docker image or dist-JAR?

Well, that is interesting. The hash is old, but the date is recent.

Yes, I was thinking that the scan could regularly run after the snapshot image is generated. In our environment we can pull a docker image from our artifactory image repo and running that image we can generate a report from any image that we pulled locally. A simpler report is emitted to the console and the full one is uploaded to our prisma cloud server. Whatever image we deploy to our infrastructure is scanned automatically and we are notified if there is a vulnerability in it.

$ twistcli images scan --details --address https://<prisma server address> -u '<user name>' apachepinot/pinot:0.11.0-SNAPSHOT-438c53b-20220715

I can ask the team that supports it how it could be set up.

walterddr commented 1 year ago

the commit hash 438c53b was on May 12. would you be able to share exactly how we can generate this report from a docker image or dist-JAR?

Well, that is interesting. The hash is old, but the date is recent.

Yes, I was thinking that the scan could regularly run after the snapshot image is generated. In our environment we can pull a docker image from our artifactory image repo and running that image we can generate a report from any image that we pulled locally. A simpler report is emitted to the console and the full one is uploaded to our prisma cloud server. Whatever image we deploy to our infrastructure is scanned automatically and we are notified if there is a vulnerability in it.

$ twistcli images scan --details --address https://<prisma server address> -u '<user name>' apachepinot/pinot:0.11.0-SNAPSHOT-438c53b-20220715

I can ask the team that supports it how it could be set up.

so we are setting up a pipeline to scan our docker image: https://github.com/apache/pinot/pull/9044. this should give a SARIF. so far for the latest commits we don't see any high/crit issue on the core pinot JAR. the remaining ones are from the optional plugins modules. which will take time to resolve after core release

gyokketto commented 1 year ago

@walterddr, I resolved the conflicts. Does running the tests need someones approval?

codecov-commenter commented 1 year ago

Codecov Report

Merging #8991 (3100ffd) into master (84478b6) will decrease coverage by 45.51%. The diff coverage is 10.52%.

@@              Coverage Diff              @@
##             master    #8991       +/-   ##
=============================================
- Coverage     70.13%   24.62%   -45.52%     
+ Complexity     4741       51     -4690     
=============================================
  Files          1831     1819       -12     
  Lines         96382    96031      -351     
  Branches      14408    14371       -37     
=============================================
- Hits          67594    23644    -43950     
- Misses        24125    70110    +45985     
+ Partials       4663     2277     -2386     
Flag Coverage Δ
integration1 ?
integration2 24.62% <10.52%> (-0.25%) :arrow_down:
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ot/broker/api/resources/PinotBrokerAppConfigs.java 0.00% <ø> (ø)
...e/pinot/broker/api/resources/PinotBrokerDebug.java 85.71% <ø> (ø)
...t/broker/api/resources/PinotBrokerHealthCheck.java 0.00% <ø> (ø)
...pinot/broker/api/resources/PinotBrokerRouting.java 0.00% <ø> (ø)
...pinot/broker/api/resources/PinotClientRequest.java 50.94% <ø> (-7.55%) :arrow_down:
...ache/pinot/broker/broker/AccessControlFactory.java 80.00% <ø> (ø)
...broker/broker/ZkBasicAuthAccessControlFactory.java 0.00% <ø> (ø)
...e/pinot/broker/broker/helix/BaseBrokerStarter.java 74.17% <ø> (-5.64%) :arrow_down:
.../BrokerResourceOnlineOfflineStateModelFactory.java 55.81% <ø> (ø)
...quota/HelixExternalViewBasedQueryQuotaManager.java 44.39% <ø> (-24.67%) :arrow_down:
... and 1497 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 84478b6...3100ffd. Read the comment docs.