AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
176 stars 90 forks source link

Classpath collision on ABSA Commons #714

Closed wajda closed 1 year ago

wajda commented 1 year ago
23/06/28 12:39:27 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED
    (diag message: User class threw exception: java.lang.NoSuchMethodError: 
    za.co.absa.commons.config.ConfigurationImplicits$.ConfigurationMapWrapper(Lorg/apache/commons/configuration/Configuration;)Lorg/apache/commons/configuration/Configuration;
wajda commented 1 year ago

Thanks @Zejnilovic for reporting it

wajda commented 1 year ago

The compile scope dependency tree for bundle-3.3:

image

and for the bundle-2.4 for comparison:

image

The only spotted difference - the commons-configuration:1.6 was missing in 2.4 but present in 3.3 due to the difference in Spark provided environment. This dependency is a part of Spline public API (appears in Plugin API), so it has to stay unshaded anyway.

The graph-core contains packages scala.collection and scalax which makes it dangerous to relocate due to too broad pattern.

The absa-shaded is already shaded under za.co.absa.shaded

The rest dependencies are to be relocated to za.co.absa.spline.shaded

wajda commented 1 year ago

As a result of a further discussion with Adam we decided to leave scalaj.http library unshaded, The reasons were:

  1. Even though it is not officially a part of Spline Agent public API, but many people's work could depend on it because they could develop extensions by coping code from Spline embedded plugins, modify it and make it theirs, and thus inherit some internal dependencies used in our internal plugins. Scalaj.http is among of them.
  2. Scalaj.http has been stable for a long time and is now deprecated, so we expect really small collision probability on that library.
wajda commented 1 year ago

We also discussed another option - in addition to every agent bundle jar produce another jar with the classifier shaded (e.g spark-3.2-spline-agent-bundle_2.12-1.2.1-shaded.jar). The difference would be that the original jar would have dependencies unshaded as before, but the new one would have them shaded. This would allow the users to decide if they want deps shaded or not depending on their use-case. The downside of this approach is that for majority of users it would be difficult to make a decision which jar to use, or their use-case might change, so the decision they make could be often wrong. So having two variants of jar publicly available could quickly lead to even more confusion around managing Spline dependencies. So in the end, we decided not to go down that route.