Open paulbutcher opened 1 year ago
What are "optional dependencies"?
What are "optional dependencies"?
If you take a look at, for example, jetty-server
on Maven: https://search.maven.org/artifact/org.eclipse.jetty/jetty-server/11.0.11/jar
You'll see that the "jetty-jmx" dependency is marked as optional via the <optional/>
XML tag. This kind of thing is typically used for "extra" functionality that you can opt into or out.
A good example of optional functionality that's relevant to us:
The Flix compiler depends upon jline. Jline includes Windows functionality as optional. As things currently stand, we don't include that optional functionality within the Flix compiler, which is why command line editing within the Flix REPL doesn't work as well on Windows as it does on Linux and MacOS.
Very interesting thoughts. I will comment more in detail later. One question what do we think about using GitHub releases?
One question what do we think about using GitHub releases?
I have no problem with using Git (instead of, say, Maven). It's the direction that the industry as a whole seems to be going (i.e. towards using source control systems instead of dedicated repositories in the style of Maven, npm, etc.).
Two comments though:
.fpkg
files, and consider dropping support for them:
I think we could use GitHub releases and then have a central registry where you register your package. Then we have a service that periodically queries the GitHub API to collect a list of all packages and their dependencies. This index is then used for dependency resolution.
As for creating GitHub releases, I think that could be automated flix package release
or whatever and then enter your credentials.
GitLab also has "releases" so the you could register your package on GitLab too. The only issue is that both GitHub and GitLab has to be online for you to be able to download the packages. But thats probably more likely than anything we could build ourselves anyway.
I suggest that we separate the question of how releases take place (i.e. whether we need a release process as such at all) from the question of discoverability (i.e. whether we have some central index or similar).
Let's look at how Clojure has evolved in this respect, because I think it's informative:
To see a concrete example, take a look at io.github.cognitect-labs/test-runner
(a utility published by Cognitect, the equivalent in the Clojure world to Lightbend in the Scala world):
https://github.com/cognitect-labs/test-runner
This is not available on either Maven or Clojars, and there is no JAR file or similar associated with it. Releases are simple tags:
One uses it within deps.edn
by specifying the tag and SHA, and it's downloaded directly from GitHub by the Clojure tooling:
{:deps {io.github.cognitect-labs/test-runner {:git/tag "v0.5.0" :git/sha "b3fd0d204c8fa72e4e1e2448243df7f2fbaba8b4"}}}
No need for packages whatsoever.
Of course, none of the above helps with discoverability. And there, there may be some value in having a central index. But that index could be nothing more than a list of relevant GitHub projects.
Make sense?
I understand that, but does it entail you have to clone the entire GitHub repository?
I understand that, but does it entail you have to clone the entire GitHub repository?
The Clojure implementation does clone the entire GitHub repository, yes.
I think we could probably avoid doing so if we wanted to. I'm not sure whether it's worth the effort though?
I assume that the reason why you're worried about this is disc space? I personally doubt that that's a big issue (Git's pretty good at this, as long as you don't dump big BLOBs into the repo). And you'll lose any benefit (arguably end up in a worse situation) as soon as you find yourself checking out more than one version of a dependency.
I have a lot of bad experience from the Node ecosystem where node_modules
end containing hundreds of megabytes of uncompressed JavaScript. Thus I am a big fan of single files that are compressed archives. Its actually one of the things I feel strongly about. I guess for Git they do use binary blobs (i.e. not a gazillion files), nevertheless I am also hesitant to depend on Git tooling (its has sometimes been a pain on Windows). A zipped file of all the relevant project files seems reasonable to me. But I will think more about the Git+Tag approach.
OK, I understand where you're coming from.
Note that the question of whether the release process requires the library author to create a Zip file, and the question of whether the on-disc representation of a downloaded dependency is a zip file are (or at least can be) independent of each other.
Note also that part of the issue with Node is that Node stores dependencies within each project (so if you have 10 projects, each of which download the same 10 dependencies, then you have 100 copies).
Maven, on the other hand, and Clojure's system, store the dependencies centrally and manipulate the Java classpath to refer to them. So 10 projects downloading the same 10 dependencies, just results in 10 copies.
I think one way we can work is by trying to write down some principles for the package manager. Presumably some we can quickly come to agreement on, others we can discuss, modify, and abandon. I can start with some principles:
This is just a random order (and written rather quickly and haphazardly):
Flix has no IR so its not really feasible to ship anything other than source code. Moreover, today shipping source code seems reasonable, as long as it is compressed.
Whether that be ZIP or something similar.
... or at least the good part of SemVer... We might even be able to use the compiler to enforce it.
E.g. like maven we will enforce a common layout. In particular, I insist that packages must have a license file.
When downloading a package, no code is ever executed on your machine. It should be 100% safe.
have to run ... more to come...
Proposed Principle: Packages contain source code.
💯👍
Proposed Principle: Packages are transmitted and stored in compressed format.
I have no particularly strong feelings about this either way. But it's worth noting that if we piggyback on Maven and Git, then we get this for free (i.e. there's no need for us to create .zip files or similar ourselves).
Personally, I would like to avoid an explicit "create package" step unless it's absolutely necessary.
Proposed Principle: Package have a common (file) structure
💯👍
Proposed Principle: "Installing/downloading" a package is safe
💯👍
Further to "Proposed Principle: Packages are transmitted and stored in compressed format."
What's the motivation behind this? Is it about saving storage space? Is it about saving transmission time? Is it about having a single file which represents a dependency? ...?
Further to "Proposed Principle: Packages are transmitted and stored in compressed format."
What's the motivation behind this? Is it about saving storage space? Is it about saving transmission time? Is it about having a single file which represents a dependency? ...?
I want to safe diskspace. My nightmare is everytime I have to edit flix.dev and my node_modules folder has 200,000+ files in it. Its also because it is easy and fast to move around single files.
A safe package cannot contain any casts (except upcasts). This entails that the effect signatures can be trusted. This means that if a function says it has no side-effects, it cannot have any side-effects. This allows programmers to use libraries without worrying they can be backdoored. (E.g. secretly starting a webserver to mine bitcoin.)
(Not all packages will be implemented as safe, but that's fine. The point is that e.g. a "unit conversion library" can be declared as safe and programmers can trust that.)
Proposed Principle: A package can be declared as "safe"
I think this immediately asks the question, what if a subset of the API is safe?
Proposed Principle: Packages are transmitted and stored in compressed format.
I'd like to add that right now when you download a flix package you can't see its public functions and you cant look into it because its a zip. We need a way to see the available functions. Don't know how but it seems lacking to rely on autocomplete exploration
I think this immediately asks the question, what if a subset of the API is safe?
Then you have to break the package in two.
I'd like to add that right now when you download a flix package you can't see its public functions and you cant look into it because its a zip. We need a way to see the available functions. Don't know how but it seems lacking to rely on autocomplete exploration
What about Flix doc for that package?
What about Flix doc for that package?
Ah I see, I missed that command. But still this is a json, this doesn't nicely help me as a user to see what is in the package. And it requires manual effort of the publisher even though it is a common task for all packages.
I want to safe diskspace. My nightmare is everytime I have to edit flix.dev and my node_modules folder has 200,000+ files in it. Its also because it is easy and fast to move around single files.
Right. But this problem arises because npm stores dependencies within the project that uses them.
Maven, by contrast, stores dependencies within your ~/.m2
directory, and Clojure's system stores them within ~/.gitlibs
. So in both cases you only ever have a single dependency downloaded once, even if it's used within 100 different projects, and you never "move around" dependencies.
I strongly think that we should adopt the Maven/Clojure approach, not the npm approach. If we do so, does this have any bearing on your feelings when it comes to compression?
There is another high level design decision we need to take for Flix's dependency management.
Broadly speaking (this is a generalisation, but largely holds true) there are two different approaches to dependency management in wide use:
clj
command).The second approach is clearly more complex, but it's intended to allow end products (i.e. things which are created from many dependencies and deployed in production) to be treated differently from libraries (i.e. things which are combined to create end products). The Cargo documentation has a good explanation of the intent behind this:
For my part, I'm not convinced that the additional complexity of approach 2 is worth it (Clojure manages just fine without it), but clearly there are plenty of ecosystems which have decided that it's worth it.
To make this more concrete, let's consider a situation where we are creating an end-product which depends upon two libraries, both of which depend upon the same logging library. Let's say our product is "Acme", the two libraries are "Frobnicate" and "Munge", and the logging library is "Loggify".
When everything goes well, this is how approach 1 works:
1.0.0
, 1.1.0
, and 2.0.0
.Loggify: 1.0.0
.Loggify: 1.1.0
.Frobnicate: 1.2.3, Munge: 3.4.5
At some point, if we want to update the versions of our dependencies, we edit the manifest to refer to newer versions of either Frobnicate or Munge, and if they depend upon a later version of Logger, then we'll again get whatever is the most recent version specified in the transitive closure of the dependencies.
When everything goes well, this is how approach 2 works:
Loggify: >=1.0.0
Loggify: >=1.1.0
Frobnicate: *, Munge: *
Frobnicate: 1.2.3, Munge: 3.4.5, Loggify: 2.0.0
.At some point in the future, if we want to update the versions of our dependencies, we update the lock file (typically this isn't by editing the lock file, but by running some "update dependencies" command). Or we can update the manifest to change the permissible version ranges and run the "update dependencies" command.
Where things get interesting, of course, is where everything doesn't go well:
Loggify: 1.*.*
and Munge says Loggify: >=2.0.0
.... but 2.0.0 isn't backward compatible with 1.0.0.
Maybe its a stupid question and a bit unrelated, but I was a bit confused with the version numbers here, are they not semantic versions? or is compatibility specified by other means?
... but 2.0.0 isn't backward compatible with 1.0.0.
Maybe its a stupid question and a bit unrelated, but I was a bit confused with the version numbers here, are they not semantic versions? or is compatibility specified by other means?
Perhaps. The only thing that the package manager can (potentially) do is enforce a version number format. There's no way for it to enforce any kind of semantics associated with those versions.
I think most ecosystems have an assumption (explicit in some cases, implicit in others) that versions follow something close to semantic versioning. But there's no way to enforce it that I'm aware of. Although perhaps Flix could get closer than most if we implement the checks that @magnus-madsen alluded to in yesterday's meeting.
I wouldn't think they were checked, but I just thought that the manager would use the version numbers to decide compatibility, e.g. but 2.0.0 isn't backward compatible with 1.0.0.
would always be true in the eyes of the manager
I wouldn't think they were checked, but I just thought that the manager would use the version numbers to decide compatibility, e.g.
but 2.0.0 isn't backward compatible with 1.0.0.
would always be true in the eyes of the manager
Ah, got you.
So yes, some systems do make that assumption (or something like it). But I'm not sure that there's any real consistency.
For systems that follow approach 1, it's not really an issue: you just choose whichever is the most recent version that's explicitly mentioned in a dependency. So if one dependency mentions version 1.1.0, and another mentions 2.0.0, then you go with 2.0.0. And if 2.0.0 causes problems for the first dependency then ... tough.
For systems that follow approach 2, they tend to rely on the judgement of the person specifying the version range. So saying that we depend upon version 1.*.*
implies that we think that we won't be compatible with version 2.0.0
. Whereas if we say that we depend upon version >=1.0.0
then that implies that we will be. How we know ... is an interesting question.
How we know ... is an interesting question.
Which is, I think, at the heart of the reason why I prefer approach 1.
The only person who really understands the compatibility constraints of a library is the author of that library. Approach 2 places the responsibility on the consumer, not the author.
Approach 1 doesn't have a good answer either. But it doesn't even try to answer it.
A few quick thoughts:
Further to recent conversations, see below a couple of first stabs at what a Flix package manifest might look like in TOML. This is heavily inspired by Cargo, but with some modifications to fit with both Flix and Maven:
First, here's a top-level project (i.e. something that can be build into a JAR and deployed):
[package]
name = "example-flix-project"
version = "1.2.3"
flix = "0.31.0"
description = """
An example of an end product, i.e. something deployed in production which depends upon multiple libraries
"""
paths = ["src"]
[dependencies]
"org.postgresql/postgresql".mvn = "42.3.3"
"org.eclipse.jetty/jetty-server".mvn = { version = "11.0.11", exclusions = [ "org.slf4j/slf4j-api" ]}
"com.github.paulbutcher/my-flix-library".fpkg = "0.3.1"
[build.dev]
paths = ["dev"]
config = { allow-holes = true, allow-debug = true }
[build.dev.dependencies]
"some/dev-specific-library".mvn = "2.3.4"
[build.prod]
paths = ["prod"]
[build.test]
paths = ["test"]
config = { allow-holes = true, allow-debug = true }
[build.test.dependencies]
"some/test-specific-library".fpkg = "0.1.2"
[build.bench]
paths = ["bench"]
Notes:
[package]
section is optional for a top-level package apart from the flix
version.
paths
and config
values are shown explicitly above, but would default to the values given (so could be omitted).And here's a library (i.e. something that can be packaged as an fpkg and used within other projects):
[package]
name = "example-flix-library"
version = "2.3.4"
flix = "0.31.0"
license = "MIT OR Apache-2.0"
description = """
An example of a Flix library, distributed as an fpkg
"""
homepage = "https://github.com/my-name/my-library"
[dependencies]
"some/dependency".mvn = "1.2.3"
"some/other-dependency".fpkg = "4.5.6"
[build.fpkg]
[build.test.dependencies]
"some/test-specific-library".fpkg = "0.1.2"
Notes:
license
(as an SPDX 2.1 license expression) or a license-file
.[build.fpkg]
section.paths
and options
are omitted in the above.[build.fpkg]
(i.e. no test files etc.).Thoughts very welcome!
Quick question: What does a minimal file look like?
Does the license stuff mean that we can throw some kind of error if a user tries to release a package with a license that is incompatible with a dependency?
The bare-minimum top-level project (no dependencies other than Flix, using default values for everything) would be:
[package]
flix = "0.31.0"
(although we could require more if we wanted to: a project name, for example).
For a library (again, no dependencies and defaults throughout), it would be:
[package]
name = "example-flix-library"
version = "2.3.4"
flix = "0.31.0"
license = "MIT OR Apache-2.0"
description = """
An example of a Flix library, distributed as an fpkg
"""
homepage = "https://github.com/my-name/my-library"
[build.fpkg]
(again, we could add or remove requirements to taste).
Does the license stuff mean that we can throw some kind of error if a user tries to release a package with a license that is incompatible with a dependency?
Possibly. It depends upon how rich the metadata available for licenses is.
Overall reaction: Looks good 👍
Comments:
[package]
looks good, but I would not make the flix version mandatory (until such a time we can actually support that).[package]
I would make name
mandatory.paths
? Or perhaps, why can we specify paths under package, build, test etc. etc. Are they additive? What happens if you don't specify them. Also defaults, I presume we will have sane ones?org.postgresql/postgresql".mvn
Any reason to cluster maven and fpkg packages under the same banner?
[package]
looks good, but I would not make the flix version mandatory (until such a time we can actually support that).
Agreed (obviously we'll work up to this incrementally, but I wanted to give an indication of where we were heading).
[package]
I would makename
mandatory.
Fair enough.
- Do we want to split source code into src and test? Or just have
paths
? Or perhaps, why can we specify paths under package, build, test etc. etc. Are they additive? What happens if you don't specify them. Also defaults, I presume we will have sane ones?
I'm not 100% sure I understand what you're asking, but I'll try to express the intention:
paths
and dependencies
within build.whatever
extra-paths
and extra-dependencies
(this is what Clojure's deps.edn
does, for example) but I'm not sure that the extra verbosity is worth it. But I'm happy to be persuaded otherwise.dev
builds look at Flix files within both src
and dev
prod
builds look at Flix files within both src
and prod
- How does one "read aloud"
org.postgresql/postgresql".mvn
Maven dependencies are identified by three components, the group id, the artifact id and the version. So in "org.postgresql/postgresql".mvn = "42.3.3"
, we're looking for the Maven dependency with group id org.postgresql
, the artifact id postgresql
and the version 42.3.3
.
Any reason to cluster maven and fpkg packages under the same banner?
We could do this, but I think to do so would raise the source of the dependency too high. The important thing is what the dependencies are, not where they are.
Bear in mind that over time we will probably want to support more than these two dependency sources. We will probably want some means of referring to local dependencies during development, for example (i.e. before they're released) and it would be a nuisance to have to move dependencies around within the toml file when switching from local dependency to remote.
[package] looks good, but I would not make the flix version mandatory (until such a time we can actually support that).
I think it should be mandatory, I'd rather know and have to download a new flix manually than to not know
Any reason to cluster maven and fpkg packages under the same banner?
Recursively the flix dependencies might give more maven dependencies so it doesn't seem like they can be handled separately
We should look into this one: https://github.com/tomlj/tomlj (is listed as compliant and recently updated. I also think it has minimal dependencies).
Good find! https://github.com/TheElectronWill/night-config is fine, but far from ideal.
I wanted to add a few comments that has been floating around in my mind:
I think future programming languages should aim to have maximal integration of tooling. In my opinion I think that when languages rely on multiple separate tools their integration tend to be poor and lack overall vision. For example, I have been very disappointed by the IDE support and auto-completion support of most languages. Java has decent IDE support, but it took 25 years. For Flix, I want to avoid that by re-using the compiler for IDE support. I think these observations also applies to package management, "javadoc", linters, and code formatters. I would like to have those tools all under the same umbrella in the same code base.
I think that compilers should be made package-aware. This is already seen in Rust where the language knows about crates (see the module system). Moreover, Rust is able to solve the diamond problem (i.e. two packages depending on different versions of the same package) by having versions directly in the compiler. Similarly, the Elm compiler apparently has support for SemVer-- i.e. it will check that the interface of a package follows SemVer conventions. I would like all of this for Flix and more. Perhaps I even want versioning in the language itself (and not just internally in the compiler).
I think its important to keep the number of compiler flags and "compiler modes" to a a minimum. (See also the "one language" principle). I think this can be done by having a very clear flowchart (which can become a mental model) of how the compiler can be invoked. And equally important, by having only one binary. In particular, I envision three modes: "here is a bunch files", "here is a directory", and "here is a TOML-configuration". Each mode should strive to support the same set of functionality, e.g. build-jar
.
For practical and philosophical reasons, its important we maintain control over the entry point (i.e. main). Both to ensure the "one binary" principle but also because for our experiments we often need full control. If we were gated behind another tool that could be a problem.
I think these observations leads us towards a path where we should design a careful flowchart for Main
. We should deal with TOML
configuration and package versioning in Scala (because the compiler will ultimately need to know about versions of each symbol). But there could still potentially be room for some components to be implemented in Flix. For example, we could perhaps do package resolution in Flix via a carefully designed API.
You've asserted several times that Rust is capable of doing things because its package management is handled by the compiler.
But Rust's package manager is separate from the compiler and called cargo?
paulbutcher@Pauls-MBP package-playground % which cargo
/opt/homebrew/bin/cargo
paulbutcher@Pauls-MBP package-playground % cargo --version
cargo 1.64.0 (387270bc7 2022-09-16)
paulbutcher@Pauls-MBP package-playground % which rustc
/opt/homebrew/bin/rustc
paulbutcher@Pauls-MBP package-playground % rustc --version
rustc 1.64.0
Can you help me understand?
I definitely don't know all the details, but somehow the compiler must know that there is both X (v1)
and X (v2)
.
That, I believe. What I'm trying to understand is your assertion that it can only know that because the compiler and package manager are one thing (which they are not)
FWIW, I 100% agree with all of your goals, I just don't see why they require the package manager and compiler to either:
(perhaps they do, but I haven't seen an argument yet which explains why that's the case)
FWIW, I 100% agree with all of your goals, I just don't see why they require the package manager and compiler to either:
1. share a single source base 2. share a single compiled binary
I think you can definitely have such an architecture. What I wonder about is how to pass information back and forth.
As a counterpoint, imagine that I have a library which exists at version 1.1.0, and which is compiled and tested against Flix version 0.32.0. And then I upgrade that library to use some new Flix feature which wasn't supported in the 0.32.0 version of the library, so it has to be compiled against Flix 0.42.0; that version of the library is 1.2.0 (say).
If we want to compile one version of the library with one version of the Flix compiler, and the other with a different version, we could do that by forking one flix compiler from another, but would be easier (I think?) if they were both forked from some third body of source which is the thing that understands dependencies rather than the 0.32.0 version of flix having to know how to fork a version of the compiler that wasn't written when it was released?
See below some thoughts on what we might want from an expanded project and dependency management solution. This is intended to be a starting point for discussion only, so please feel free to object to any or all of what follows.
Current situation
Today, Flix:
src
test
HISTORY.md
LICENSE.md
README.md
.fpkg
or.jar
files) within a subdirectory calledlib
Flix packages (
.fpkg
files) are Zip files with a similar structure to the above (the only difference being that thetest
directory is removed).The Flix command line (and REPL) provide an
install
command which, given a GitHub path of the form<username>/<repo>
finds the.fpkg
file associated with the most recent release and copies it into thelib
subdirectory. There is no way to download anything other than the most recent release, and no mechanism to install.jar
files.Requirements
Here’s an initial stab at requirements for a full-featured project/dependency management system:
.fpkg
or.jar
) the project depends upon, and the versions of those libraries:Nice to have:
Proposal
This proposal shamelessly steals from the approach adopted by Clojure.
I suggest that we:
instead of:
flix.jar
which will do the remainder of the work.flix.jar
on startup and used to:Open questions:
flix.json
?flix-project.json
?Possible file structure (assuming JSON):
This defines a project which:
src
directory.Maven dependencies are downloaded to the
~/.m2
directory as-per Maven, and the classpath updated to reference them. Flix dependencies are analogously downloaded to~/.flix
.A development build is run with (the
-A
option includes an alias):This includes additional source files from the
dev
directory, plus a development-specific dependency.A production build is run with:
This includes additional source files from the
prod
directory.Tests are run with:
Which automatically includes the
test
alias to include additional source files from thetest
directory, plus a test-specific dependency.