com-lihaoyi / mill

Mill is a fast JVM build tool that supports Java and Scala. 2-4x faster than Gradle and 4-10x faster than Maven for common workflows, Mill aims to make your project’s build process performant, maintainable, and flexible
https://mill-build.org/
MIT License
2.22k stars 356 forks source link

Flesh out support for shading third party libraries library and add example docs (500USD Bounty) #3815

Open lihaoyi opened 1 month ago

lihaoyi commented 1 month ago

From the maintainer Li Haoyi: I'm putting a 500USD bounty on this issue, payable by bank transfer on a merged PR implementing this.


We need to be able to depend on shaded third-party libraries, have the original library properly excluded from the runClasspath, and instead replaced by the shaded classfiles. Right now we can shade stuff in assembly using AssemblyRules, but shading should also apply to:

  1. run
  2. jar (which should include the shaded dependency)
  3. publishLocal/publishAll (which should publish jars containing the shaded classes transitively and no dependency on the original and/or an exclusion),
  4. runClasspath (e.g. if someone wants to use the classfiles in a Jvm.runSubprocess or Jvm.runClassLoader it should exclude the original and include the shaded classes)

There's some design space here to explore.

Should have an example under javalib/dependencies.adoc for shading Java using jarjar, scalalib/dependencies.adoc using https://github.com/eed3si9n/jarjar-abrams, maybe something for kotlin

lefou commented 1 month ago

We already have a dependency on jarajar-abrams to provide the Relocate assembly rule. https://github.com/com-lihaoyi/mill/blob/b66ef93e3ca926f480555040c9726814662cc97e/scalalib/src/mill/scalalib/Assembly.scala#L81

lihaoyi commented 1 month ago

I think this is a broader topic than just assembly. For example, if I shade an upstream library, I should be able to publish to Maven Central and bundle the shaded library, with the normal library <dependency> metadata removed. Same as if I shade a library and a downstream module in the same build depends on me

lefou commented 1 month ago

The dependency going to be shaded should be declared as compileIvyDeps or compileModuleDeps.

lihaoyi commented 1 month ago

But compile*Deps is not sufficient: all that does is ensure the originsl classfiles are not included in the jar, which is correct, but i also want the shaded classfiles included somehow in the jar (not assembly) so it can be used at runtime.

neontty commented 1 week ago

hi team,

Could you please help me understand if this use case falls under the scope of the bounty?:

the spark-excel package is relying on org.apache.poi:poi-ooxml and a shaded version of org.apache.commons:commons-compress. The poi-ooxml also depends on commons-compress

assembly rule:

  def assemblyRules = Seq(
    Rule.Relocate("org.apache.commons.compress.**", "shadeio.commons.compress.@1")
  )

However, at runtime we still see the poi-ooxml library referring to the unshaded commons-compress which already exists on the system (part of built-in databricks runtime and can't be changed).

Would this bounty correctly handle the above scenario? If so I may be willing to add to the bounty.

lihaoyi commented 1 week ago

@neontty yes your use case is exactly that of the bounty. If you have a need for this would love your help implementing it!

neontty commented 1 week ago

Excellent! This would greatly benefit the users of the crealytics spark-excel package. Let me discuss with my coworkers.