VirtusLab / scala-cli

Scala CLI is a command-line tool to interact with the Scala language. It lets you compile, run, test, and package your Scala code (and more!)
https://scala-cli.virtuslab.org
Apache License 2.0
546 stars 128 forks source link

Document the differences between `--assembly` and `--standalone` flags of the `package` sub-command #1783

Open rmgk opened 1 year ago

rmgk commented 1 year ago

scala-cli 0.19.0

When comparing the following invocations:

1) scala-cli package --standalone test.scala --output standalone.jar 4) scala-cli package --assembly test.scala --output assembly.jar 3) scala-cli package --standalone test.scala --output standalone-no-preamble.jar --preamble=false 4) scala-cli package --assembly test.scala --output assembly-no-preamble.jar --preamble=false

The result are 4 files:

1) -rwxrwxr-x 7,1M standalone.jar
2) -rwxrwxr-x 6,9M assembly.jar
3) -rwxrwxr-x 7,1M standalone-no-preamble.jar
4) -rwxrwxr-x 6,9M assembly-no-preamble.jar

I observe the following inconsistencies: • 1 and 3 (standalone) are larger than 2 and 4 (assemblies) even though they seem functionally equivalent. • 1 and 3 (standalone) require about 120ms (depends on the system in question, but it is a lot) of additional startup time compared to 2 and 4 (assemblies). Both when running directly or via java -jar. I assume there is some launch logic included … ? • a plain runner script (that downloads dependencies) has an execution time somewhere in between standalone and assembly. • 4 is marked executable, even though it is not executable (no preamble included) • 3 is executable (has a preamble included) – preamble is documented to only work for assembly, so this is somewhat expected.

As far as I can tell, --standalone is just a worse version of --assembly, so maybe --standalone should be removed or just become an alias … ?

Gedochao commented 1 year ago

I think the problem here lies in the lack of proper documentation on how --standalone works, actually. And we should definitely fix that part, so thanks for raising this!

As for the difference between --assembly and --standalone, let me try to explain.

The --assembly flag allows to create an assembly JAR (or a fat JAR, as they're also known), blending all of the dependencies byte code together into a single Jar file. More on assemblies can be found in the assemblies section of the package doc

The --standalone flag in turn allows to modify how a bootstrap JAR is packaged.

bootstrap JARs are the default package format of Scala CLI, they're a sort of lightweight launcher JAR. More on what that means can be found in the default package format section of our package doc.

When packaging with --standalone set to on, all of the dependencies will be packed into the JAR (similarly to an assembly, yes), but it will retain the bootstrap format.

To illustrate this further, here's what a fat JAR created with --assembly contains:

assembly.jar
├── LICENSE
├── META-INF
├── Main$.class
├── Main.class
├── Main.tasty
├── NOTICE
├── library.properties
├── rootdoc.txt
└── scala

and here's what a bootstrap looks like:

bootstrap.jar
├── META-INF
└── coursier

I do agree they are functionally very similar (2 kinds of JARs with all the dependencies packed in, yes), but they do produce slightly different outputs.

Gedochao commented 1 year ago

Now, there's a separate-but-tied issue I spotted while investigating this: --assembly and --standalone can currently be passed together, which I believe shouldn't be the case. We should probably at the very least warn the user about the --standalone being overridden by --assembly, as passing both produces a fat JAR.

Gedochao commented 1 year ago

Way I see it, the requirements for this ticket should be the following:

rmgk commented 1 year ago

I agree with your assessment about documentation – the whole motivation for this issue was me trying to figure out the differences and then documenting what I observed.

I think questions that might be good to answer in the documentation: • Why choose one format over another? (My guess: bootstrap (even when standalone) keeps the jars separate thus does not need to figure out how to merge jars) • Will anything be extracted to temporary folders or similar … ?

But beyond just documentation. There seems to be basically the following settings: • Bootstrap or just jar of classes • Includes dependencies or not • includes executable script in the header And all three seem to be independent of each other, i.e.,

command                              | boostrap  | dependencies | launcher
---------------------------------------------------------------------------------
package                              | ✔         | ✘            | ✔
package --standalone                 | ✔         | ✔            | ✔
package --library                    | ✘         | ✘            | ✘
package --assembly                   | ✘         | ✔            | ✔
package --assembly --preamble=false  | ✘         | ✔            | ✘

At least, that’s how I perceive it with the features of the package command I know, maybe there is more. So I wonder if those shouldn’t just be the flags to a package jar command or similar.