Open kgyrtkirk opened 3 weeks ago
I'll describe one approach - there might be others:
# do a full dist build like
mvn install -DskipTests -Pdist -Pbundle-contrib-exts
from there ; we could keep a textfile in the project which supposed to match the list of jars in the dist build. By sorting by filename it could show that the same is present at multiple places - and also it could show that different versions of the same lib are present
tar tzf distribution/target/apache-druid-32.0.0-SNAPSHOT-bin.tar.gz | grep jar$ | sed 's|.*/||'|grep -v '^druid'|sort > distribution/dist_jars.txt
if that list changes; the build should fail
There could also be a check to ensure that libs from lib
are get reused via provided
# make a content list
tar tzf distribution/target/apache-druid-32.0.0-SNAPSHOT-bin.tar.gz | grep jar$ | grep -v '/druid' > base.li
# this list should be empty
fgrep -f <(grep /lib/ base.li |sed 's|.*/||') base.li |grep -v '/lib/'
I was checking this and found two problems as of now
For 1st I found a way to reduce it to max 3 copies which reduced the distribution size from 900M to 600M - https://github.com/apache/druid/pull/17321 I am looking for a way to reduce it to 1 copy
For 2nd I found Maven enforcer rule - https://maven.apache.org/enforcer/enforcer-rules/dependencyConvergence.html We can add dependencies in exclude for which we know multiple versions are required.
There is some work done in #16973 that might be usable here.
Description
It would be great to at least somehow track the 3rd party deps in a way that they need changes to the PR itself if new ones gets added - which will drag attention toward them and could possibly improve the situation.
Motivation
It seems like there are quite a few versions of the same lib in the distribution build - these might have landed via transitive deps and most likely without being considered.