apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.47k stars 3.7k forks source link

Track 3rd party libs used in the dist package #17208

Open kgyrtkirk opened 3 weeks ago

kgyrtkirk commented 3 weeks ago

Description

It would be great to at least somehow track the 3rd party deps in a way that they need changes to the PR itself if new ones gets added - which will drag attention toward them and could possibly improve the situation.

Motivation

It seems like there are quite a few versions of the same lib in the distribution build - these might have landed via transitive deps and most likely without being considered.

kgyrtkirk commented 3 weeks ago

I'll describe one approach - there might be others:

# do a full dist build like
mvn install -DskipTests  -Pdist -Pbundle-contrib-exts

from there ; we could keep a textfile in the project which supposed to match the list of jars in the dist build. By sorting by filename it could show that the same is present at multiple places - and also it could show that different versions of the same lib are present

tar tzf distribution/target/apache-druid-32.0.0-SNAPSHOT-bin.tar.gz | grep jar$ | sed 's|.*/||'|grep -v '^druid'|sort > distribution/dist_jars.txt

if that list changes; the build should fail

There could also be a check to ensure that libs from lib are get reused via provided

# make a content list
tar tzf distribution/target/apache-druid-32.0.0-SNAPSHOT-bin.tar.gz | grep jar$ | grep -v '/druid' > base.li
# this list should be empty
fgrep -f <(grep /lib/ base.li |sed 's|.*/||') base.li |grep -v '/lib/'
shigarg1 commented 2 weeks ago

I was checking this and found two problems as of now

  1. There are multiple copies of same version across multiple extensions
  2. There are different versions for same dependencies coming as part of transitive dependencies.

For 1st I found a way to reduce it to max 3 copies which reduced the distribution size from 900M to 600M - https://github.com/apache/druid/pull/17321 I am looking for a way to reduce it to 1 copy

For 2nd I found Maven enforcer rule - https://maven.apache.org/enforcer/enforcer-rules/dependencyConvergence.html We can add dependencies in exclude for which we know multiple versions are required.

abhishekagarwal87 commented 2 weeks ago

There is some work done in #16973 that might be usable here.