Discussion: project and dependency management

paulbutcher commented 1 year ago

See below some thoughts on what we might want from an expanded project and dependency management solution. This is intended to be a starting point for discussion only, so please feel free to object to any or all of what follows.

Current situation

Today, Flix:

Treats any directory with the following specific structure as a project:
- a subdirectory called src
- a subdirectory called test
- a file called HISTORY.md
- a file called LICENSE.md
- a file called README.md
Automatically loads any libraries (either .fpkg or .jar files) within a subdirectory called lib

Flix packages (.fpkg files) are Zip files with a similar structure to the above (the only difference being that the test directory is removed).

The Flix command line (and REPL) provide an install command which, given a GitHub path of the form <username>/<repo> finds the .fpkg file associated with the most recent release and copies it into the lib subdirectory. There is no way to download anything other than the most recent release, and no mechanism to install .jar files.

Requirements

Here’s an initial stab at requirements for a full-featured project/dependency management system:

A more flexible definition of what constitutes a project, allowing authors to move away from the very prescriptive structure above if necessary.
The ability to specify both which libraries (.fpkg or .jar) the project depends upon, and the versions of those libraries:
- The ability to automatically download the above.
- The ability to transitively download the dependencies of the above.
- Some mechanism to resolve conflicts in the dependency graph.
- Some mechanism to specify whether optional dependencies should be downloaded.
For Java dependencies:
- The ability to piggyback on the Maven ecosystem and exploit the dependency information already present therein.
For Flix dependencies:
- The ability to support repositories other than GitHub (e.g. GitLab)
- Perhaps: The ability to publish on Maven?
The ability to specify which version of the Flix compiler should be used.
Retain the ability to hardcode dependencies, including the Flix compiler, by including them directly within a project if desired.
The ability to specify build parameters (e.g. to allow separate development and production builds).

Nice to have:

A discoverability/search mechanism to allow users to:
- Search for dependencies
- See which versions exist of a dependency
- View the documentation
The ability to publish new versions of Flix packages.
A mechanism to identify out of date dependencies
- The ability to automatically upgrade to the most recent versions
The ability to visualise the dependency graph

Proposal

This proposal shamelessly steals from the approach adopted by Clojure.

I suggest that we:

Create a command-line “shim” which:
- Allows the compiler to be run with:
```
flix <options>
```
  instead of:
```
java -jar flix.jar <options>
```
- The primary task of this shim will be to identify (and possibly download) the the appropriate flix.jar which will do the remainder of the work.
- As part of this, we should think about installation:
- We probably want to take over/adopt the existing Homebrew formula (which already includes a shim).
- Other package managers?
- Windows?
Place dependency information within a file within the root of a project. This is read by flix.jar on startup and used to:
- Locate source files and hardcoded dependencies.
- Locate (and possibly download) specified dependencies.
- Configure the classpath appropriately to include all the above.
- Run the compiler.
Any directory containing such a dependency file will be considered a Flix project.

Open questions:

What format should we use for the configuration file?
- Obvious possibilities are JSON or YAML
- Other possibilities which provide similar capabilities to JSON or YAML are EDN or XML, but I can see little benefit to either in the context of Flix?
- Other possibilities which perhaps have a little more to recommend them (as they provide potential benefits over JSON/YAML) are:
- JSON5
- Dhall
- Jsonnet
- Cue
- TOML
- NestedText
- Thoughts welcome
What should the file be called?
- flix.json?
- flix-project.json?
- …?

Possible file structure (assuming JSON):

{
  "$schema": "https://flix.dev/schema/v1.json",
  "flix": {"version": "0.30.0"},
  "deps": {
    "org.postgresql/postgresql": {
      "version": {"mvn": "42.3.3"}
    },
    "org.eclipse.jetty/jetty-server": {
      "version": {"mvn": "11.0.11"},
      "exclusions": ["org.slf4j/slf4j-api"]
    },
    "com.github.paulbutcher/my-flix-library": {
      "version": {"git": "0.3.1", "sha": "46a3ac1"}
    }
  },
  "paths": ["src"],
  "aliases": {
    "dev": {
      "extra-paths": ["dev”],
      "extra-deps": {
        "some/dev-specific-library": {
          "version": {"mvn": “2.3.4”}
        }
      }
    },
    "prod": {
      "extra-paths": ["prod”]
    },
    "test": {
      "extra-paths": ["test"],
      "extra-deps": {
        "some/test-specific-library": {
          "version": {"mvn": "0.1.2"}
        }
      }
    }
  }
}

This defines a project which:

Should be compiled using version 0.30.0 of the Flix compiler.
Depends upon the Postgres driver downloaded from Maven
Depends upon the Jetty server downloaded from Maven, but excludes its SLF4J logging dependency.
Depends upon a Flix library published by Paul Butcher on GitHub.
Source lives within the src directory.

Maven dependencies are downloaded to the ~/.m2 directory as-per Maven, and the classpath updated to reference them. Flix dependencies are analogously downloaded to ~/.flix.

A development build is run with (the -A option includes an alias):

flix -A dev run

This includes additional source files from the dev directory, plus a development-specific dependency.

A production build is run with:

flix -A prod run

This includes additional source files from the prod directory.

Tests are run with:

flix test

Which automatically includes the test alias to include additional source files from the test directory, plus a test-specific dependency.

mlutze commented 1 year ago

What are "optional dependencies"?

paulbutcher commented 1 year ago

What are "optional dependencies"?

If you take a look at, for example, jetty-server on Maven: https://search.maven.org/artifact/org.eclipse.jetty/jetty-server/11.0.11/jar

You'll see that the "jetty-jmx" dependency is marked as optional via the <optional/> XML tag. This kind of thing is typically used for "extra" functionality that you can opt into or out.

paulbutcher commented 1 year ago

A good example of optional functionality that's relevant to us:

The Flix compiler depends upon jline. Jline includes Windows functionality as optional. As things currently stand, we don't include that optional functionality within the Flix compiler, which is why command line editing within the Flix REPL doesn't work as well on Windows as it does on Linux and MacOS.

magnus-madsen commented 1 year ago

Very interesting thoughts. I will comment more in detail later. One question what do we think about using GitHub releases?

paulbutcher commented 1 year ago

One question what do we think about using GitHub releases?

I have no problem with using Git (instead of, say, Maven). It's the direction that the industry as a whole seems to be going (i.e. towards using source control systems instead of dedicated repositories in the style of Maven, npm, etc.).

Two comments though:

I don't think we should only support GitHub. At the very least we should support GitLab as well.
I think we should think about whether we want to continue to support .fpkg files, and consider dropping support for them:
1. Packages like this need to be built, which means a release process consisting of:
  1. Build the package
  2. Upload the package to GitHub
  3. Create a release
2. Instead, we could simply specify a tag within a GitHub repository (this is what Clojure does for packages hosted on GitHub). Meaning that the release process is simply:
  1. Create a tag

magnus-madsen commented 1 year ago

I think we could use GitHub releases and then have a central registry where you register your package. Then we have a service that periodically queries the GitHub API to collect a list of all packages and their dependencies. This index is then used for dependency resolution.

As for creating GitHub releases, I think that could be automated flix package release or whatever and then enter your credentials.

magnus-madsen commented 1 year ago

GitLab also has "releases" so the you could register your package on GitLab too. The only issue is that both GitHub and GitLab has to be online for you to be able to download the packages. But thats probably more likely than anything we could build ourselves anyway.

paulbutcher commented 1 year ago

I suggest that we separate the question of how releases take place (i.e. whether we need a release process as such at all) from the question of discoverability (i.e. whether we have some central index or similar).

Let's look at how Clojure has evolved in this respect, because I think it's informative:

In the early days, Clojure piggy-backed on the pre-existing Java ecosystem, (i.e. Maven) with releases being packaged as JAR files with POM descriptors, hosted within dedicated Maven repositories.
- This worked, but was clunky as hell
Very soon afterwards, Clojars came along.
- This was (is!) a dedicated Clojure artefact repository
- It was much simpler both from a release process POV and discoverability POV.
- But at heart it's basically the same as Maven (i.e. releases are packaged up as JAR files and stored in a dedicated repository)
These days, both Maven and Clojars are viewed as legacy within the Clojure community, and instead releases are managed via git repositories:
- Releases are specified as tags within a Git repo
- There is no build step whatsoever: no JAR file or equivalent.

To see a concrete example, take a look at io.github.cognitect-labs/test-runner (a utility published by Cognitect, the equivalent in the Clojure world to Lightbend in the Scala world):

https://github.com/cognitect-labs/test-runner

This is not available on either Maven or Clojars, and there is no JAR file or similar associated with it. Releases are simple tags:

One uses it within deps.edn by specifying the tag and SHA, and it's downloaded directly from GitHub by the Clojure tooling:

{:deps {io.github.cognitect-labs/test-runner {:git/tag "v0.5.0" :git/sha "b3fd0d204c8fa72e4e1e2448243df7f2fbaba8b4"}}}

No need for packages whatsoever.

Of course, none of the above helps with discoverability. And there, there may be some value in having a central index. But that index could be nothing more than a list of relevant GitHub projects.

Make sense?

magnus-madsen commented 1 year ago

I understand that, but does it entail you have to clone the entire GitHub repository?

paulbutcher commented 1 year ago

I understand that, but does it entail you have to clone the entire GitHub repository?

The Clojure implementation does clone the entire GitHub repository, yes.

I think we could probably avoid doing so if we wanted to. I'm not sure whether it's worth the effort though?

I assume that the reason why you're worried about this is disc space? I personally doubt that that's a big issue (Git's pretty good at this, as long as you don't dump big BLOBs into the repo). And you'll lose any benefit (arguably end up in a worse situation) as soon as you find yourself checking out more than one version of a dependency.

magnus-madsen commented 1 year ago

I have a lot of bad experience from the Node ecosystem where node_modules end containing hundreds of megabytes of uncompressed JavaScript. Thus I am a big fan of single files that are compressed archives. Its actually one of the things I feel strongly about. I guess for Git they do use binary blobs (i.e. not a gazillion files), nevertheless I am also hesitant to depend on Git tooling (its has sometimes been a pain on Windows). A zipped file of all the relevant project files seems reasonable to me. But I will think more about the Git+Tag approach.

paulbutcher commented 1 year ago

OK, I understand where you're coming from.

Note that the question of whether the release process requires the library author to create a Zip file, and the question of whether the on-disc representation of a downloaded dependency is a zip file are (or at least can be) independent of each other.

paulbutcher commented 1 year ago

Note also that part of the issue with Node is that Node stores dependencies within each project (so if you have 10 projects, each of which download the same 10 dependencies, then you have 100 copies).

Maven, on the other hand, and Clojure's system, store the dependencies centrally and manipulate the Java classpath to refer to them. So 10 projects downloading the same 10 dependencies, just results in 10 copies.

magnus-madsen commented 1 year ago

I think one way we can work is by trying to write down some principles for the package manager. Presumably some we can quickly come to agreement on, others we can discuss, modify, and abandon. I can start with some principles:

This is just a random order (and written rather quickly and haphazardly):

Proposed Principle: Packages contain source code.

Flix has no IR so its not really feasible to ship anything other than source code. Moreover, today shipping source code seems reasonable, as long as it is compressed.

Proposed Principle: Packages are transmitted and stored in compressed format.

Whether that be ZIP or something similar.

Proposed Principle: Packages use SemVer

... or at least the good part of SemVer... We might even be able to use the compiler to enforce it.

Proposed Principle: Package have a common (file) structure

E.g. like maven we will enforce a common layout. In particular, I insist that packages must have a license file.

Proposed Principle: "Installing/downloading" a package is safe

When downloading a package, no code is ever executed on your machine. It should be 100% safe.

have to run ... more to come...

paulbutcher commented 1 year ago

Proposed Principle: Packages contain source code.

💯👍

Proposed Principle: Packages are transmitted and stored in compressed format.

I have no particularly strong feelings about this either way. But it's worth noting that if we piggyback on Maven and Git, then we get this for free (i.e. there's no need for us to create .zip files or similar ourselves).

Personally, I would like to avoid an explicit "create package" step unless it's absolutely necessary.

Proposed Principle: Package have a common (file) structure

💯👍

Proposed Principle: "Installing/downloading" a package is safe

💯👍

paulbutcher commented 1 year ago

Further to "Proposed Principle: Packages are transmitted and stored in compressed format."

What's the motivation behind this? Is it about saving storage space? Is it about saving transmission time? Is it about having a single file which represents a dependency? ...?

magnus-madsen commented 1 year ago

Further to "Proposed Principle: Packages are transmitted and stored in compressed format."

What's the motivation behind this? Is it about saving storage space? Is it about saving transmission time? Is it about having a single file which represents a dependency? ...?

I want to safe diskspace. My nightmare is everytime I have to edit flix.dev and my node_modules folder has 200,000+ files in it. Its also because it is easy and fast to move around single files.

magnus-madsen commented 1 year ago

Proposed Principle: A package can be declared as "safe"

A safe package cannot contain any casts (except upcasts). This entails that the effect signatures can be trusted. This means that if a function says it has no side-effects, it cannot have any side-effects. This allows programmers to use libraries without worrying they can be backdoored. (E.g. secretly starting a webserver to mine bitcoin.)

(Not all packages will be implemented as safe, but that's fine. The point is that e.g. a "unit conversion library" can be declared as safe and programmers can trust that.)

JonathanStarup commented 1 year ago

Proposed Principle: A package can be declared as "safe"

I think this immediately asks the question, what if a subset of the API is safe?

Proposed Principle: Packages are transmitted and stored in compressed format.

I'd like to add that right now when you download a flix package you can't see its public functions and you cant look into it because its a zip. We need a way to see the available functions. Don't know how but it seems lacking to rely on autocomplete exploration

magnus-madsen commented 1 year ago

I think this immediately asks the question, what if a subset of the API is safe?

Then you have to break the package in two.

magnus-madsen commented 1 year ago

I'd like to add that right now when you download a flix package you can't see its public functions and you cant look into it because its a zip. We need a way to see the available functions. Don't know how but it seems lacking to rely on autocomplete exploration

What about Flix doc for that package?

JonathanStarup commented 1 year ago

What about Flix doc for that package?

Ah I see, I missed that command. But still this is a json, this doesn't nicely help me as a user to see what is in the package. And it requires manual effort of the publisher even though it is a common task for all packages.

paulbutcher commented 1 year ago

I want to safe diskspace. My nightmare is everytime I have to edit flix.dev and my node_modules folder has 200,000+ files in it. Its also because it is easy and fast to move around single files.

Right. But this problem arises because npm stores dependencies within the project that uses them.

Maven, by contrast, stores dependencies within your ~/.m2 directory, and Clojure's system stores them within ~/.gitlibs. So in both cases you only ever have a single dependency downloaded once, even if it's used within 100 different projects, and you never "move around" dependencies.

I strongly think that we should adopt the Maven/Clojure approach, not the npm approach. If we do so, does this have any bearing on your feelings when it comes to compression?

paulbutcher commented 1 year ago

There is another high level design decision we need to take for Flix's dependency management.

Broadly speaking (this is a generalisation, but largely holds true) there are two different approaches to dependency management in wide use:

Dependencies are specified within a single manifest, and dependency coordinates are simple (dependency, version) pairs. This is the approach adopted by Maven, sbt, and Clojure's build tools (both Leiningen and the newer dependency management built in to the clj command).
Dependencies are specified in two files, a manifest in which dependency coordinates are (dependency, version range) pairs, plus a lock-file which locks a build to a specific set of dependency versions. This is the approach used by npm, Cargo, Gradle, Bundler (Ruby), and Hex (Elixir).

The second approach is clearly more complex, but it's intended to allow end products (i.e. things which are created from many dependencies and deployed in production) to be treated differently from libraries (i.e. things which are combined to create end products). The Cargo documentation has a good explanation of the intent behind this:

https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries

For my part, I'm not convinced that the additional complexity of approach 2 is worth it (Clojure manages just fine without it), but clearly there are plenty of ecosystems which have decided that it's worth it.

To make this more concrete, let's consider a situation where we are creating an end-product which depends upon two libraries, both of which depend upon the same logging library. Let's say our product is "Acme", the two libraries are "Frobnicate" and "Munge", and the logging library is "Loggify".

When everything goes well, this is how approach 1 works:

Loggify specifies its version in its manifest. Let's say that there are three different versions released: 1.0.0, 1.1.0, and 2.0.0.
Frobnicate version 1.2.3 specifies that it depends upon Loggify: Loggify: 1.0.0.
Munge version 3.4.5 specifies that it depends upon Loggify: Loggify: 1.1.0.
Acme specifies that it depends upon both Frobincate and Munge: Frobnicate: 1.2.3, Munge: 3.4.5
When building Acme, the dependency resolution system downloads Frobnicate 1.2.3 and Munge 3.4.5, notices that they both depend upon Loggify, and downloads the most recent version specified in the transitive closure of the dependencies, in this case 1.1.0.
This is deterministic, because given a specific set of (dependency, version) pairs, we'll get exactly the same answer every time.

At some point, if we want to update the versions of our dependencies, we edit the manifest to refer to newer versions of either Frobnicate or Munge, and if they depend upon a later version of Logger, then we'll again get whatever is the most recent version specified in the transitive closure of the dependencies.

When everything goes well, this is how approach 2 works:

As above, Loggify specifies its version, and there are three different releases.
Frobnicate 1.2.3 specifies that it depends upon at least version 1.0.0 of Loggify: Loggify: >=1.0.0
Munge 3.4.5 specifies that it depends upon at least version 1.1.0 of Loggify: Loggify: >=1.1.0
Acme specifies that it depends upon Frobnicate and Munge, either some specific versions, version ranges or wildcards. For this example let's assume the latter: Frobnicate: *, Munge: *
The first time we build Acme, the build system notices that there is no lock file, downloads the latest versions of Frobnicate and Munge (1.2.3 and 3.4.5 respectively), the latest version of Loggify that satisfies the constraints specified by the transitive closure of the dependencies (in this case 2.0.0), and writes a lock file locking dependencies to these specific versions, i.e.: Frobnicate: 1.2.3, Munge: 3.4.5, Loggify: 2.0.0.
The above is non deterministic, because exactly which version of each dependency we get will depend on which version is the most recent satisfying the constraints when we run the build.
We check this lock file into source control, meaning that these exact versions will be used for all subsequent builds (so everyone in the team will get the same answer, as will continuous integration). I.e. it is now deterministic.

At some point in the future, if we want to update the versions of our dependencies, we update the lock file (typically this isn't by editing the lock file, but by running some "update dependencies" command). Or we can update the manifest to change the permissible version ranges and run the "update dependencies" command.

Where things get interesting, of course, is where everything doesn't go well:

Imagine for approach 2, what happens if Frobnicate says that it requires a version of Loggify with a version number of the form: Loggify: 1.*.* and Munge says Loggify: >=2.0.0.
Imagine for approach 1, what happens if Frobnicate depends Loggify 1.0.0, Munge depends upon 2.0.0, but 2.0.0 isn't backward compatible with 1.0.0.

JonathanStarup commented 1 year ago

... but 2.0.0 isn't backward compatible with 1.0.0.

Maybe its a stupid question and a bit unrelated, but I was a bit confused with the version numbers here, are they not semantic versions? or is compatibility specified by other means?

paulbutcher commented 1 year ago

... but 2.0.0 isn't backward compatible with 1.0.0.

Maybe its a stupid question and a bit unrelated, but I was a bit confused with the version numbers here, are they not semantic versions? or is compatibility specified by other means?

Perhaps. The only thing that the package manager can (potentially) do is enforce a version number format. There's no way for it to enforce any kind of semantics associated with those versions.

I think most ecosystems have an assumption (explicit in some cases, implicit in others) that versions follow something close to semantic versioning. But there's no way to enforce it that I'm aware of. Although perhaps Flix could get closer than most if we implement the checks that @magnus-madsen alluded to in yesterday's meeting.

JonathanStarup commented 1 year ago

I wouldn't think they were checked, but I just thought that the manager would use the version numbers to decide compatibility, e.g. but 2.0.0 isn't backward compatible with 1.0.0. would always be true in the eyes of the manager

paulbutcher commented 1 year ago

I wouldn't think they were checked, but I just thought that the manager would use the version numbers to decide compatibility, e.g. but 2.0.0 isn't backward compatible with 1.0.0. would always be true in the eyes of the manager

Ah, got you.

So yes, some systems do make that assumption (or something like it). But I'm not sure that there's any real consistency.

For systems that follow approach 1, it's not really an issue: you just choose whichever is the most recent version that's explicitly mentioned in a dependency. So if one dependency mentions version 1.1.0, and another mentions 2.0.0, then you go with 2.0.0. And if 2.0.0 causes problems for the first dependency then ... tough.

For systems that follow approach 2, they tend to rely on the judgement of the person specifying the version range. So saying that we depend upon version 1.*.* implies that we think that we won't be compatible with version 2.0.0. Whereas if we say that we depend upon version >=1.0.0 then that implies that we will be. How we know ... is an interesting question.

paulbutcher commented 1 year ago

How we know ... is an interesting question.

Which is, I think, at the heart of the reason why I prefer approach 1.

The only person who really understands the compatibility constraints of a library is the author of that library. Approach 2 places the responsibility on the consumer, not the author.

Approach 1 doesn't have a good answer either. But it doesn't even try to answer it.

magnus-madsen commented 1 year ago

A few quick thoughts:

I am leaning towards a toml format inspired by Rust.
I am leaning towards Rust/Cargo's approach (i.e. approach 2 with a lockfile).
In the future, as part of some research project, I want the compiler to enforce semantic versioning up to compilation (i.e. SemVer will guarantee your program at least type checks).
I fully believe we should have maven integration + a source-based Flix package manager.
In the future, I want to distinguish safe from unsafe packages (to "solve" supply-chain attacks).

paulbutcher commented 1 year ago

Further to recent conversations, see below a couple of first stabs at what a Flix package manifest might look like in TOML. This is heavily inspired by Cargo, but with some modifications to fit with both Flix and Maven:

First, here's a top-level project (i.e. something that can be build into a JAR and deployed):

[package]
name = "example-flix-project"
version = "1.2.3"
flix = "0.31.0"
description = """
An example of an end product, i.e. something deployed in production which depends upon multiple libraries
"""
paths = ["src"]

[dependencies]
"org.postgresql/postgresql".mvn = "42.3.3"
"org.eclipse.jetty/jetty-server".mvn = { version = "11.0.11", exclusions = [ "org.slf4j/slf4j-api" ]}
"com.github.paulbutcher/my-flix-library".fpkg = "0.3.1"

[build.dev]
paths = ["dev"]
config = { allow-holes = true, allow-debug = true }

[build.dev.dependencies]
"some/dev-specific-library".mvn = "2.3.4"

[build.prod]
paths = ["prod"]

[build.test]
paths = ["test"]
config = { allow-holes = true, allow-debug = true }

[build.test.dependencies]
"some/test-specific-library".fpkg = "0.1.2"

[build.bench]
paths = ["bench"]

Notes:

Most of what's in the [package] section is optional for a top-level package apart from the flix version.
- (this might be a private project, so it's unreasonable to expect a license, for example).
The paths and config values are shown explicitly above, but would default to the values given (so could be omitted).

And here's a library (i.e. something that can be packaged as an fpkg and used within other projects):

[package]
name = "example-flix-library"
version = "2.3.4"
flix = "0.31.0"
license = "MIT OR Apache-2.0"
description = """
An example of a Flix library, distributed as an fpkg
"""
homepage = "https://github.com/my-name/my-library"

[dependencies]
"some/dependency".mvn = "1.2.3"
"some/other-dependency".fpkg = "4.5.6"

[build.fpkg]

[build.test.dependencies]
"some/test-specific-library".fpkg = "0.1.2"

Notes:

Libraries must specify either license (as an SPDX 2.1 license expression) or a license-file.
We know that this is a library because there's a [build.fpkg] section.
Default values for paths and options are omitted in the above.
The fpkg would only include source files specified at the top level and in [build.fpkg] (i.e. no test files etc.).

Thoughts very welcome!

magnus-madsen commented 1 year ago

Quick question: What does a minimal file look like?

mlutze commented 1 year ago

Does the license stuff mean that we can throw some kind of error if a user tries to release a package with a license that is incompatible with a dependency?

paulbutcher commented 1 year ago

The bare-minimum top-level project (no dependencies other than Flix, using default values for everything) would be:

[package]
flix = "0.31.0"

(although we could require more if we wanted to: a project name, for example).

For a library (again, no dependencies and defaults throughout), it would be:

[package]
name = "example-flix-library"
version = "2.3.4"
flix = "0.31.0"
license = "MIT OR Apache-2.0"
description = """
An example of a Flix library, distributed as an fpkg
"""
homepage = "https://github.com/my-name/my-library"

[build.fpkg]

(again, we could add or remove requirements to taste).

paulbutcher commented 1 year ago

Does the license stuff mean that we can throw some kind of error if a user tries to release a package with a license that is incompatible with a dependency?

Possibly. It depends upon how rich the metadata available for licenses is.

magnus-madsen commented 1 year ago

Overall reaction: Looks good 👍

Comments:

[package] looks good, but I would not make the flix version mandatory (until such a time we can actually support that).
[package] I would make name mandatory.
Do we want to split source code into src and test? Or just have paths? Or perhaps, why can we specify paths under package, build, test etc. etc. Are they additive? What happens if you don't specify them. Also defaults, I presume we will have sane ones?
How does one "read aloud" org.postgresql/postgresql".mvn

magnus-madsen commented 1 year ago

Any reason to cluster maven and fpkg packages under the same banner?

paulbutcher commented 1 year ago

[package] looks good, but I would not make the flix version mandatory (until such a time we can actually support that).

Agreed (obviously we'll work up to this incrementally, but I wanted to give an indication of where we were heading).

[package] I would make name mandatory.

Fair enough.

Do we want to split source code into src and test? Or just have paths? Or perhaps, why can we specify paths under package, build, test etc. etc. Are they additive? What happens if you don't specify them. Also defaults, I presume we will have sane ones?

I'm not 100% sure I understand what you're asking, but I'll try to express the intention:

I want to keep source and test (and for that matter benchmarks) separate so when you download a library, you only get the code of the library, not any of the machinery that the author uses to support the library's development.
Paths (and dependencies) are additive
- I did wonder about calling the paths and dependencies within build.whatever extra-paths and extra-dependencies (this is what Clojure's deps.edn does, for example) but I'm not sure that the extra verbosity is worth it. But I'm happy to be persuaded otherwise.
The defaults are exactly what's specified in the first example above. So:
- dev builds look at Flix files within both src and dev
- prod builds look at Flix files within both src and prod
- etc.

How does one "read aloud" org.postgresql/postgresql".mvn

Maven dependencies are identified by three components, the group id, the artifact id and the version. So in "org.postgresql/postgresql".mvn = "42.3.3", we're looking for the Maven dependency with group id org.postgresql, the artifact id postgresql and the version 42.3.3.

Any reason to cluster maven and fpkg packages under the same banner?

We could do this, but I think to do so would raise the source of the dependency too high. The important thing is what the dependencies are, not where they are.

Bear in mind that over time we will probably want to support more than these two dependency sources. We will probably want some means of referring to local dependencies during development, for example (i.e. before they're released) and it would be a nuisance to have to move dependencies around within the toml file when switching from local dependency to remote.

JonathanStarup commented 1 year ago

[package] looks good, but I would not make the flix version mandatory (until such a time we can actually support that).

I think it should be mandatory, I'd rather know and have to download a new flix manually than to not know

Any reason to cluster maven and fpkg packages under the same banner?

Recursively the flix dependencies might give more maven dependencies so it doesn't seem like they can be handled separately

magnus-madsen commented 1 year ago

We should look into this one: https://github.com/tomlj/tomlj (is listed as compliant and recently updated. I also think it has minimal dependencies).

paulbutcher commented 1 year ago

Good find! https://github.com/TheElectronWill/night-config is fine, but far from ideal.

magnus-madsen commented 1 year ago

I wanted to add a few comments that has been floating around in my mind:

I think future programming languages should aim to have maximal integration of tooling. In my opinion I think that when languages rely on multiple separate tools their integration tend to be poor and lack overall vision. For example, I have been very disappointed by the IDE support and auto-completion support of most languages. Java has decent IDE support, but it took 25 years. For Flix, I want to avoid that by re-using the compiler for IDE support. I think these observations also applies to package management, "javadoc", linters, and code formatters. I would like to have those tools all under the same umbrella in the same code base.
I think that compilers should be made package-aware. This is already seen in Rust where the language knows about crates (see the module system). Moreover, Rust is able to solve the diamond problem (i.e. two packages depending on different versions of the same package) by having versions directly in the compiler. Similarly, the Elm compiler apparently has support for SemVer-- i.e. it will check that the interface of a package follows SemVer conventions. I would like all of this for Flix and more. Perhaps I even want versioning in the language itself (and not just internally in the compiler).
I think its important to keep the number of compiler flags and "compiler modes" to a a minimum. (See also the "one language" principle). I think this can be done by having a very clear flowchart (which can become a mental model) of how the compiler can be invoked. And equally important, by having only one binary. In particular, I envision three modes: "here is a bunch files", "here is a directory", and "here is a TOML-configuration". Each mode should strive to support the same set of functionality, e.g. build-jar.
For practical and philosophical reasons, its important we maintain control over the entry point (i.e. main). Both to ensure the "one binary" principle but also because for our experiments we often need full control. If we were gated behind another tool that could be a problem.

I think these observations leads us towards a path where we should design a careful flowchart for Main. We should deal with TOML configuration and package versioning in Scala (because the compiler will ultimately need to know about versions of each symbol). But there could still potentially be room for some components to be implemented in Flix. For example, we could perhaps do package resolution in Flix via a carefully designed API.

magnus-madsen commented 1 year ago

Before I forget, another thing I am very interested in is downstream testing. This has been studied a lot in the literature, but I don't know if any languages support it out of the box. The idea is simple: Say you want to release a new version of your library, then the tooling should allow you to run all downstream dependencies of your package (with the new version) to help you discover any breakages. The relation to the above is that we really want versioning to be part of the overall compiler (and testing infrastructure).

paulbutcher commented 1 year ago

You've asserted several times that Rust is capable of doing things because its package management is handled by the compiler.

But Rust's package manager is separate from the compiler and called cargo?

paulbutcher@Pauls-MBP package-playground % which cargo
/opt/homebrew/bin/cargo
paulbutcher@Pauls-MBP package-playground % cargo --version
cargo 1.64.0 (387270bc7 2022-09-16)
paulbutcher@Pauls-MBP package-playground % which rustc
/opt/homebrew/bin/rustc
paulbutcher@Pauls-MBP package-playground % rustc --version
rustc 1.64.0

Can you help me understand?

magnus-madsen commented 1 year ago

I definitely don't know all the details, but somehow the compiler must know that there is both X (v1) and X (v2).

paulbutcher commented 1 year ago

That, I believe. What I'm trying to understand is your assertion that it can only know that because the compiler and package manager are one thing (which they are not)

paulbutcher commented 1 year ago

FWIW, I 100% agree with all of your goals, I just don't see why they require the package manager and compiler to either:

share a single source base
share a single compiled binary

paulbutcher commented 1 year ago

(perhaps they do, but I haven't seen an argument yet which explains why that's the case)

magnus-madsen commented 1 year ago

FWIW, I 100% agree with all of your goals, I just don't see why they require the package manager and compiler to either:
1. share a single source base

2. share a single compiled binary

I think you can definitely have such an architecture. What I wonder about is how to pass information back and forth.

paulbutcher commented 1 year ago

As a counterpoint, imagine that I have a library which exists at version 1.1.0, and which is compiled and tested against Flix version 0.32.0. And then I upgrade that library to use some new Flix feature which wasn't supported in the 0.32.0 version of the library, so it has to be compiled against Flix 0.42.0; that version of the library is 1.2.0 (say).

If we want to compile one version of the library with one version of the Flix compiler, and the other with a different version, we could do that by forking one flix compiler from another, but would be easier (I think?) if they were both forked from some third body of source which is the thing that understands dependencies rather than the 0.32.0 version of flix having to know how to fork a version of the compiler that wasn't written when it was released?

flix / flix