JuliaLang / Juleps

Julia Enhancement Proposals
Other
67 stars 24 forks source link

Pkg3: naming of project filenames #43

Open dns2utf8 opened 7 years ago

dns2utf8 commented 7 years ago

Hi all

I saw the talk on Pkg3 and was a bit confused with the naming. My personal expectation was something like this:

Of course Config.* would be fine too. The mix of Config.toml and Manifest.toml was confusing me since many develop environments use them also. Like android with Manifest.xml or npm with packages.json and packages-lock.json or cargo with Cargo.toml and Cargo.lock.

Regards

StefanKarpinski commented 7 years ago

Naming these things is a bit challenging. I find the terminology used by other package managers kind of confusing and unfortunate. Let me try to explain some of the naming choices I've made.

The TOML spec says that all TOML files should end in .toml – Pkg3 follows that.

The Config.toml file contains a project's top-level information that is completely independent of any details of how the package manager (or anyone else) may have chosen to satisfy the dependencies of the project. It has entries like this:

authors = "Stefan Karpinski <stefan@karpinski.org>"
desc = "The next-generation Julia package manager."
keywords = ["package", "management"]
license = "MIT"
name = "Pkg3"
repo = "https://github.com/StefanKarpinski/Pkg3.jl.git"

[deps]
SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

These are objective facts about the project, which do not depend in any way on what particular versions of anything are chosen to make things work. The project needs the packages whose UUIDs are ea8e919c-243c-51af-8825-aaa63cd721ce and 2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91 and these packages are referred to in the project as SHA and StatsBase, respectively. Unless the project's needs and code change, these are simple facts about the project. This file says nothing about what versions are used to satisfy these dependencies. The Config.toml file should always be checked into a project since otherwise you have no idea what it depends on or how to get it working.

Another reasonable name for the Config.toml file might be Project.toml since it is metadata about a project. That would seem a bit weird when it was in a package repo, however, where you'd expect it to be called Package.toml or something. It would be even weirder when appearing in a global named environment directory like ~/.julia/environments/v1.2 where there's no specific project that it describes. So Config.toml seems like a good name since the word "config" implies top-level user-provided configuration and applies equally to projects (non-reusable units of code), packages (reusable units of code), and global environments (named sets of packages one uses together). Another decent name might be Metadata.toml but that seems a bit too abstract and overloaded. Other names might be Env.toml or Environment.toml but that's also not super.

The Manifest.toml file records specific versions used to satisfy the dependencies listed in Config.toml. It includes not only versions of top-level dependencies listed in Config.toml but also the versions of all of their dependencies. An example of its contents might be:

[[Compat]]
hash-sha1 = "6e9c90ac34a173c2a2c179735427078b989a3bdc"
uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
version = "0.26.0"

[[DataStructures]]
deps = ["Compat"]
hash-sha1 = "84bea819ff0c08e8f9fd55a637d25bdc685c6c5b"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.6.0"

[[SHA]]
deps = ["Compat"]
hash-sha1 = "9ce386dcf6dde95a1e267e320332d192bc090fff"
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
version = "0.3.3"

[[SpecialFunctions]]
deps = ["Compat"]
hash-sha1 = "03e6a824d4f33a6bc856a5fcfd9d14729a9f18d4"
uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
version = "0.1.1"

[[StatsBase]]
deps = ["Compat", "DataStructures", "SpecialFunctions"]
hash-sha1 = "4820d195cd378926a7a59e6e14727a394cc8f123"
uuid = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
version = "0.17.0"

This corresponds to a particular way of satisfying the dependencies given in the above Config.toml file – i.e. it provides SHA and StatsBase. There are many different ways of satisfying these dependencies, and this is just one of them. This file may or may not be checked into a project since one particular way of satisfying requirements is not really always of interest. However, I'm thinking that one will generally want to commit this anyway, since even if one doesn't use the same exact versions oneself, at least that way there is a record of some working configuration, which presumably passed tests and whatnot. But it's not strictly necessary.

This file is called Manifest.toml because the dictionary defines a "manifest" as:

A document giving comprehensive details of a ship and its cargo and other contents, passengers, and crew for the use of customs officers.

or, if you prefer the Webster's 1913 dictionary definition:

A list or invoice of a ship's cargo, containing a description by marks, numbers, etc., of each package of goods, to be exhibited at the customhouse.

That's what this file does – it gives all of the identifying details of exactly what's "on the ship". The Config file, on the other hand, is not a manifest at all – it's a high-level description of what a package is and what it needs. There are no specifics of how those needs are met, only enough information that the needs are clear and unambiguous. Indirect dependencies are not listed in Config.toml at all, even though they are definitely on the ship.

The naming and format of environment logs hasn't really been settled on yet. However, I have a somewhat hard time seeing why that file should have the word "manifest" in it anywhere. In what sense is it a manifest? It's not a list of the contents of anything. It's a record of the locations of environments that have been used, thereby allowing the package manager to figure out what versions of packages are still potentially in use and (by process of elimination) which can be safely deleted. I guess if the file ends up being a log of paths to Manifest.toml files it might make sense to call it something like ~/.julia/Manifest.log. It still wouldn't itself be a manifest, but it would be a log of manifests, so the name would make sense. I suspect, however, that it will make more sense to track environment locations, since then you can log environments whether they have manifest files or not. As to the file format, having it be a full-on database seems like overkill. Some kind of file locking and/or log sharding seems like it should be sufficient, although I guess we'll see.

vtjnash commented 7 years ago

Like Cargo.toml, perhaps it should just be Julia.toml? But I'm assuming the containing path would usually already contain indications that it is an associated artifact for a given project. But I'm not sure if the examples given are entire inline with that assumption.

StefanKarpinski commented 7 years ago

The plan (spelled out in the Julep and implemented in Pkg3.jl) is to look for JuliaConfig.toml and JuliaManifest.toml first and use those if they exist (and completely ignore Config.toml and Manifest.toml if they do). That way a Julia-only project doesn't need to redundantly name things Julia-this or Julia-that but mixed-language projects can use longer names with Julia prefixes.

tkelman commented 7 years ago

Config.toml sounds to me a bit like something the user might be expected to modify to tweak optional settings, which it sort of would be for projects but not really packages, right? Even for projects wouldn't things like adding dependencies usually be done through Julia Pkg3 APIs instead of manually editing a toml file?

Description.toml or Info.toml or Listing.toml might be overly generic possible names for it.

StefanKarpinski commented 7 years ago

You can edit the file by hand, and it will likely contain other kinds of configuration. But yes, looking up UUIDs and entering them is tedious so it would likely be done by Pkg3 automatically in response to a command (and interactive disambiguation when necessary) and update actual versions in the manifest file at the same time. Info.toml is ok, but at that point I might prefer Meta.toml.

Let's play the "what do you call it" game:

This file provides high-level metadata about a project (non-reusable unit of code), package (reusable unit of code), or a named global environment (set of packages often used together), including what packages it depends on, global configuration, and "project targets" – i.e. things you can do with a project.

This suggests Metadata.toml but something still doesn't feel right about that. I guess we could call it Project.toml and just deal with that being a bit off in packages or environments.

staticfloat commented 7 years ago

I like something similar to Package.toml, or Packages.toml. This gets across the idea that the configuration has something to do with packaging, which may not be immediately obvious to users coming to Pkg3 for the first time.

quinnj commented 7 years ago

I have to say I agree w/ @staticfloat; Package.toml seems the most obvious and natural to me. I know we're worried about the "wait this isn't a package, just a project I'm working on!" use-case, but I still feel like you can just conceptually call that a "package", even if it's not something you plan on publishing for the rest of the world (which we could maybe call Libraries or Public Packages as convention instead).

staticfloat commented 7 years ago

For the expliciphiles among us:

For the cutesy among us:

StefanKarpinski commented 7 years ago

There are many projects for which it would never make sense to turn them into packages – e.g. projects where the end artifact is a program, not a reusable piece of Julia code. And a project isn't just not-yet-a-package; e.g. only non-package projects will be able to do any global configuration of other packages. Otherwise when using multiple packages together, they could require conflicting configurations. Similarly, environments are reusable sets of packages and are also not packages.

Another option would be to call it Package.toml, Project.toml or Environment.toml depending on whether it's in a package, project or named environment. I don't really like squatting on so many names though.

staticfloat commented 7 years ago

I think Environment.toml could possibly describe all three in one fell swoop; Environment.toml describes how a package fits into its larger environment, how a project's environment should be setup, or how an environment proper should be constructed.

StefanKarpinski commented 7 years ago

It's just so looooong though. I have also considered Env.toml but that seems a bit too terse.

staticfloat commented 7 years ago

I think Environment.toml is an acceptable length.

StefanKarpinski commented 7 years ago

Says the man who proposed package_system_metadata.toml 😝

ararslan commented 7 years ago

I actually like Env.toml; "env" is a pretty standard abbreviation of "environment," so the name is still clear.

The ship has probably sailed, but I still think it'd be nice to always specify that the TOML file is related to Julia, not conditionally look for a Julia-named file. For example, it could be PkgEnv.toml, PkgConfig.toml, or what have you, akin to Rust's Cargo.toml. It's more immediate disambiguation. IMO anyway.

StefanKarpinski commented 7 years ago

Another option: call it Project.toml (which applies to both packages and projects) and represent named global environments with a single file with this format:

SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[[Compat]]
hash-sha1 = "6e9c90ac34a173c2a2c179735427078b989a3bdc"
uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
version = "0.26.0"

[[DataStructures]]
deps = ["Compat"]
hash-sha1 = "84bea819ff0c08e8f9fd55a637d25bdc685c6c5b"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.6.0"

[[SHA]]
deps = ["Compat"]
hash-sha1 = "9ce386dcf6dde95a1e267e320332d192bc090fff"
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
version = "0.3.3"

[[SpecialFunctions]]
deps = ["Compat"]
hash-sha1 = "03e6a824d4f33a6bc856a5fcfd9d14729a9f18d4"
uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
version = "0.1.1"

[[StatsBase]]
deps = ["Compat", "DataStructures", "SpecialFunctions"]
hash-sha1 = "4820d195cd378926a7a59e6e14727a394cc8f123"
uuid = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
version = "0.17.0"

After all, the rest of the metadata in the artist currently known as Config.toml doesn't really make any sense for a named global environment, which serves only to provide a coherent set of packages.

rofinn commented 7 years ago

FWIW, I like Package.toml (or Pkg.toml) as:

1) Config seems generic enough to be confused with a settings file that contains constants for the package/application/project code. 2) Pkg.toml leaves room to add Env.toml and Project.toml later on... if we need them.

I'm not sure it matters how long the name is as long as it's a single word that describes what's in the file; I don't imagine folks will be typing/editing these files by hand regularly.

StefanKarpinski commented 7 years ago

I did some experiments with different names. Actually changing the name of the file and the places where it's used in code and docs gives a tangible sense for how these names will feel to use – and I like Project.toml by far the best. It reads right in code and in documentation when you refer to a "(Julia) project file" and a "(Julia) manifest file". As in:

When doing rm A, if A is not in the project file, the operation does nothing and prints a message to that effect. When doing rm A=uuid, even if A is not in the project file but is in the manifest with UUID uuid, then ...

When this was written with Config.toml and "config file" it made a lot less natural sense. The name Package.toml and "package file" works pretty well as long as the project you're talking about happens to be a package. However, I think it's really important that we unify the expression of dependencies for all kinds of projects, not just packages, and using the name "package" when you're not talking about a package is just confusing. Even when talking about a package, the name "project" seems better to me since we're talking about the requirements of the package as a project – providing reusable code is only one of many roles that a package has as a project. A package can have targets besides those needed for loading it, e.g. for testing and running code. So I just think the term "project" gives the right sense of what's in the file. I think the terms "project file" and "manifest file" give a clear intuitive sense of what's in each of the files: the project file contains info about the project – name, description, authors, dependencies – while the manifest file contains info about a particular snapshot of everything "on board".

In short, I'm going ahead with this file naming scheme:

jpfairbanks commented 6 years ago

Just to support the namestorming I think that saying package file when you are working on a project makes sense because you are talking about the packages that the project depends on. And we know that the project only depends on packages because projects aren't reusable pieces of code.

StefanKarpinski commented 6 years ago

True, but saying that "A is in the project" or "B is in the manifest" makes sense whereas "A is in the package" does not make sense. Only some of the information in the project file is about packages that a project depends on; all of the information is metadata about the project, however.

jpfairbanks commented 6 years ago

That makes sense. I was thinking only about the dependency management aspects.