JuliaLang / Juleps

Julia Enhancement Proposals
Other
67 stars 24 forks source link

Pkg3: immutability of compatibility #14

Open StefanKarpinski opened 7 years ago

StefanKarpinski commented 7 years ago

Continuing half of the discussion on https://github.com/JuliaLang/Juleps/issues/3.

StefanKarpinski commented 7 years ago

If we allow compatibility of versions to be mutated after the fact (as we do now in METADATA), one major issue is that it will be impossible, when compatibility has been modified later, to know what the state of compatibility constraints on versions actually were when versions were resolved. This could hide resolution bugs and generally makes understanding the system harder.

One possible solution is for each modification of compatibility constraints to increment a build number of a version or something like that, so 1.2.3 is the version with its original compatibility, while 1.2.3+1 would be a version with potentially modified compatibility or other metadata changes, which would get its own metadata in the registry, but share the same source tree.

At that point, however, I have to question why 1.2.3+1 wouldn't simply be called 1.2.4. The main objection seems to be that it's annoying / hard to create patches and package maintainers often aren't as responsive as we'd like. Which makes me think that we should just make it easier to make this kind of patch update and make it possible without the package maintainers involvement.

StefanKarpinski commented 7 years ago

In particular, patches don't need to be made on the main repository of a project, they can be made on a fork as long as they are eventually upstreamed back to the main repo.

JeffreySarnoff commented 7 years ago

+1 UUIDs

tkelman commented 7 years ago

The reason for distinguishing a compatibility-only change from a patch change is that you may need to make the former long after the fact when there have already been later patch releases.

The version history of metadata currently would allow you to reconstruct the state of compatibility (assuming no local metadata modifications have been made), though which commits of metadata are used is not recorded long term.

StefanKarpinski commented 7 years ago

The reason for distinguishing a compatibility-only change from a patch change is that you may need to make the former long after the fact when there have already been later patch releases.

If the latest patch release always supersedes previous ones in the the same major-minor series, then you can always just make a new patch. The only way needing 1.2.3+1 rather than 1.2.19 makes sense is if you want a version with compatibility fixes but without any bugfixes. That seems like a somewhat implausible situation. How would this be necessary? If such a situation did occur, we could always allow publishing 1.2.3+1 with updated compatibility but without bug fixes.

The version history of metadata currently would allow you to reconstruct the state of compatibility (assuming no local metadata modifications have been made), though which commits of metadata are used is not recorded long term.

That means we'd have to record the state of all registries in the environment, which ties the meaning of an environment to the history of registries in a way that we are (or at least I am) trying to avoid. If version compatiblity is immutable (in either 1.2.3+1 or 1.2.4 form), then you can always tell just by looking at the compatibility info for those version whether they are correct. You can't tell if they were optimal at the time, but you can verify correctness.

tkelman commented 7 years ago

If the latest patch release always supersedes previous ones in the the same major-minor series

This is not a good idea, as I've said before - there's not a lot of precedent for allowing code changes to completely supercede old versions. If there's going to be a second class of dependency resolution for complete replacement, then it should not be allowing code changes. People break their api in bugfix releases even if we tell them not to, and downstream packages are going to need to be able to use api's that only existed in early patch releases. And this situation might not be noticed immediately, so there could be enough later patch and minor releases that there isn't room to fix the situation by making a new set of renumbered releases.

StefanKarpinski commented 7 years ago

So are you ok with the idea of version metadata – especially compatibility – being immutable, but having 1.2.3+1 supercede 1.2.3 with no source code changes, only metadata changes?

tkelman commented 7 years ago

Yes, that seems like a mostly equivalent way of accomplishing the same thing as modifying compatibility in metadata. It records more history permanently (not just in git history), maybe that could be useful though.

tkelman commented 7 years ago

I do think we should keep a log of version history used by local registry copies over time, so you could feasibly implement an "undo" of a global update operation. That's a separate issue though.

StefanKarpinski commented 7 years ago

Or are you entirely against the idea that version metadata be immutable?

martinholters commented 7 years ago

Creating such a metadata-only update would be simplified if the metadata was only part of the registry, not the package itself, i.e. 1.2.3+1 could have the same hashes stored as 1.2.3. Actually, it would have to, to enforce the "no source code changes" policy. This would a) allow easy automatic verification of this policy and b) simplify metadata-only updates by non-package-maintainers.

Would that be an option? (Or is that already the idea and I misread the proposal?)

simonbyrne commented 7 years ago

The example I gave in the other thread illustrates why patches are insufficient:

  1. Pkg B v2.0.0 depends on v1.2 of Pkg A
  2. Pkg C v3.0.0 depends on v1.2 of Pkg A
  3. Pkg A v1.3.0 is tagged with new features
  4. Pkg B v2.1.0 is tagged using features of Pkg A v1.3.0, but forgets to update the version requirement
  5. Pkg B v2.1.1 is tagged fixing this.

Now user installs Pkg B and Pkg C: the end result would be:

which would be broken.

StefanKarpinski commented 7 years ago

@martinholters: Yes, having compatibility info not live in the package repo is definitely a possibility, but it would make it harder for unregistered packages to participate in version resolution. Since making unregistered packages easier to work with was one of the major requests for Pkg3, that's a bit of a problem. Also, if we move compatibility info out of the package itself, where does the developer edit it? The obvious answer is in the registry but I feel like that's not tremendously obvious or developer-friendly.

@simonbyrne: This wouldn't be the result under what I've proposed since the existence of Pkg B v2.1.1 would prevent resolution from ever choosing Pkg B v2.1.0 – that's what "strongly favor the latest patch release" is meant to convey. Instead you would get A v1.2.x, B v2.0.0 and C v3.0.0. In the other approach being discussed here, B v2.1.0+1 would fix B v2.1.0's dependencies and would similarly hide B v2.1.0 from consideration when resolving new versions.

StefanKarpinski commented 7 years ago

The core of @tkelman's objection (assuming he's not against the idea of immutable version metadata entirely, which would be good to get an answer on), seems to be that updating version metadata via new patches allows metadata fixes to be mixed with bug fixes – well, technically arbitrary source code changes, since people may not just fix bugs in patch versions. But if people stick with bug fixes in patches, this won't be a problem: why would you want a buggier version? Yes, people will screw up bug fixes, but then the appropriate action is to make another patch that fixes the fix.

Fixing version metadata for 1.2.3 by releasing 1.2.4 is less flexible that adding another level of metadata-changes-only versioning like 1.2.3+1. So why not just add another layer and semantically separate metadata changes from code changes of any kind? One reason is that semantic versioning already has three layers of versioning, which is already a lot to deal with and reason about, and adding another one seems complicated and unnecessary. At the level of practical development, people only use branches corresponding to major/minor versions: patches occur on branches with names like release-1.2 – if you want to make a new 1.2.x release, you tag the tip of release-1.2. How would this workflow change with metadata-only changes like 1.2.3+1? You need a branch for each patch release now: you'd make metadata-only fixes on release-1.2.3 and you'd need a branch like that for every single release. That just seems ridiculous. If you make metadata fixes via new patch releases, mixed in with other bug fixes, then the current workflow doesn't change at all – just fix version metadata on the release-1.2 branch and tag a new patch.

My perspective is that we want to design the package manager so that making patch versions that do anything besides fixing bugs is problematic. This will actively encourage package developers to only fix bugs in patches. Two feature of the proposed design that encourage this are:

  1. Have newer patches fully supercede older ones with the same major/minor version.
  2. Not allowing version dependencies to specify versions at patch granularity.

Both of these design choices assume that patches with the same major/minor version are equivalent aside from metadata updates and bug fixes. If a package maintainer violates this assumption by adding or removing functionality in a patch, it will cause problems. Problems lead to complaints, which will provide feedback to the maintainer and help them learn that this is bad practice and not do it in the future. This is not based on some sort of groundless optimism that people will do things correctly on their own, it's based on the principle that people respond to feedback and that we can design a system that actively causes people to receive corrective feedback. Is this limiting the ways that package developers can version their packages and have things work smoothly? Yes, but I think that's a good thing.

tkelman commented 7 years ago

If a compatibility-only change can be done only at the registry level without needing the source to change at all, then there's no need for a branch for a compatibility revision.

Designing the system to be intentionally rigid and inherently flawed in the face of a behavior that people will commonly do (a recent example, changing the type of a single parameter of a single function - that breaks the api but seems like a minor change), and in a way that cannot be easily fixed once newer versions have been published, is why I think this goal is a bad idea.

The core job of a package manager is if source has been published as a release version, it should be possible to depend on it. Demoting the patch level of versioning from this is unnecessary, adds friction to the system, and doesn't gain us anything. Downstream users are the ones who face problems from versioning mistakes, and are incapable of fixing them or working around them without cooperation from the upstream author, or forking the package and re-releasing a new series of different version numbers. We don't gain enough for this to be worth it.

tkelman commented 7 years ago

What qualifies as a bugfix is not always clear cut either. In fixing one bug, you can often accidentally (or intentionally!) break something else that downstream users were depending on. And these issues don't get identified immediately. By the time some of these issues are found, the upstream author may have moved on to a newer release series, that the downstream users don't have time to upgrade to right away (especially if there was a past release that worked fine for them). What option does downstream have to get their code working again? They could publish a fork without any of the more recent releases, but why have we made them go to that trouble when a patch level upper bound would serve the exact same purpose?

StefanKarpinski commented 7 years ago

The problem with having registry-only compatibility changes is that it:

  1. makes compatibility confusing since there are multiple conflicting – and changing – sources of what a version's compatibility actually is, and it
  2. makes registered and unregistered packages work completely differently – registered packages have a mechanism for amending compatibility while unregistered ones don't.

The process I'm proposing is straightforward and the same for registered or unregistered packages: keep definitive compatibility info in Config.toml; when compatibility needs to be adjusted, just edit Config.toml on the appropriate release branch, commit the changes and publish the tip of the release branch as a new patch.

Preferring the latest patch for version resolution doesn't make it impossible to use older patches, nor does it force users to upgrade to the latest patch – if what they're using works, no problem:

The example you allude to (where was this?) with a changed type parameter is a simple broken patch. The correct fix in such a situation if you depend on the package to exclude that specific broken patch, which solves the problem; if you're the package maintainer, the fix is to revert the part of the change that broke compatibility for someone and make a new patch release. Neither is a big problem.

I would love an actual problematic case that can't be handled with what I'm proposing instead of general arguments about what package managers should or shouldn't do. If there's some problem scenario, I want to know about it. The kind of example @simonbyrne presented is exactly what I'm talking about (hopefully my answer to that is convincing to him). The Compat example in #3, is also exactly what I'm talking about: the fact that minor updates to packages with many dependents (Compat being the most extreme example) would force patching of all dependents is a devastating problem with my original proposal, hence https://github.com/JuliaLang/Juleps/issues/15#issuecomment-261025316.

tkelman commented 7 years ago

The problem is the "broken patch" is broken from the perspective of downstream users who were using the old api, but intended as a new api by the upstream author. Upstream isn't going to revert it. Downstream then needs to indicate that all future patches are broken. That's not possible in this proposal, every new upstream release would break the downstream until downstream gets a chance to add another broken patch to their list.

It's not possible for compatibility to be set in stone and never change - compatibility depends on the entire set of possible interacting versions of dependencies, it always changes as new versions get released.

tkelman commented 7 years ago

You are proposing making it impossible to declare version compatibility bounds at patch granularity. That's necessary in the case above, where

package B depends on package A, which is at say v 1.3.3 when package B gets written (and it relies on a feature that was new in 1.3.0) package A breaks api between versions 1.3.5 and 1.3.6 package A makes many more 1.3.x releases, several 1.4.y, and has started on 2.0.0 package B gets a report that it doesn't work any more with package A v1.4.3

Assuming the author of package B can remember or recover from environment info what version of package A did work, there's no way in this proposal of reflecting its requirements since it can't express an upper bound on A v 1.3.6 that caused the problem. It could say every patch from 1.3.6 on is broken, but if those have to be listed individually then it becomes incorrect as soon as an additional 1.3.17 backport gets released. The most practical solution to immediately get a working version of its dependency is to republish a fork of the old version of package A.

What problem is solved by disallowing requirements at patch granularity, and disallowing expressing requirements as ranges?

StefanKarpinski commented 7 years ago

The subject of this issue is immutability of compatibility, which is orthogonal to patch granularity. I was trying to unmuddy the discussion by splitting #3 in to this issue and #15, which would be a better place to discuss patch granularity, although that's explicitly about the opposite complaint: that the granularity is too fine, which I already conceded.

tkelman commented 7 years ago

Splitting a discussion without posting to that effect in the discussion itself isn't terribly effective.

Compatibility constraints are either correct, too tight, or too loose with respect to the time and set of available dependency versions when you state them. As new versions become available, a previously correct set of constraints can become too tight if it doesn't include working versions, too loose if it does not indicate new breakage, or remain correct. Compatibility claims that were too tight or too loose when they were first made may need to be amended after the fact.

If making personal registries is simple, then I don't think it's worth worrying about how to amend compatibility for unregistered packages. Source releases should be immutable, compatibility often needs to be amended, so compatibility should be tracked outside of the source. If you need to amend compatibility for an unregistered package, then create a personal registry to track it.

tbreloff commented 7 years ago

I really hope that package management and compatibility can be managed outside of the actual codebase as much as possible. In fact I wish that we didn't use git tags at all. Forcing package authors to add new commits (and tag them) just to fix a dependency resolution is ridiculous. Please lets put all requirements outside of the actual package repo. Let a core group of people manage those dependencies for the curated metadata, with advice from authors. Private metadatas will be easier to manage as well.

On Wednesday, November 16, 2016, Tony Kelman notifications@github.com wrote:

Splitting a discussion without posting to that effect in the discussion itself isn't terribly effective.

Compatibility constraints are either correct, too tight, or too loose with respect to the time and set of available dependency versions when you state them. As new versions become available, a previously correct set of constraints can become too tight if it doesn't include working versions, too loose if it does not indicate new breakage, or remain correct. Compatibility claims that were too tight or too loose when they were first made may need to be amended after the fact.

If making personal registries is simple, then I don't think it's worth worrying about how to amend compatibility for unregistered packages. Source releases should be immutable, compatibility often needs to be amended, so compatibility should be tracked outside of the source. If you need to amend compatibility for an unregistered package, then create a personal registry to track it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/Juleps/issues/14#issuecomment-261115524, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492n9oSA1HZIK2ZVNgjFtcMlQUuF8oks5q-51RgaJpZM4Kyxf_ .

JeffreySarnoff commented 7 years ago

+1.618 for allowing me to become unconcerned with anything git related

tkelman commented 7 years ago

@tbreloff package authors need to be responsible for dependency versioning. What features are you using, when things break how do you fix or work around them, etc. That comes with the territory of having dependencies. If you get any help you're lucky, but you can't expect other people to do this for you.

An outside-of-the-source copy of the dependency information may need to take priority here though, as in the existing system where metadata is used for registered packages, the package's copy of REQUIRE isn't actually used except at tag time to populate the initial content.

A compatibility-only revision release could be a mechanism for this, but it needs to be possible to do that for any published release, not just the latest within a minor series. Compatibility is about the rest of the world with respect to a fixed version of a package - we shouldn't be mixing the release numbering or resolution mechanism for outside-world compatibility within the same system (and constraints) that we use for a package's own source.

tbreloff commented 7 years ago

So then maybe what I'd like is a little more subtle. It would be nice if the larger community had a mechanism to tag and fix dependecies in place of authors that don't have the time or knowledge to keep up with the process. How many times a day do you have to tell people exactly what they need to do and how to do it in order to properly register or tag? Wouldn't it be easier for everyone involved if you just did it yourself? You're the one with commit access to metadata, so why go through the silly and pointless steps that make it seem like the author has anything valuable to add? I'd be happy with v1.2+ and v1.2.3+ if it means problems are immediately solved by the people who understand the right way to solve them.

tl;dr Manage as much as possible from within metadata(s) without necessarily requiring the author

On Thursday, November 17, 2016, Tony Kelman notifications@github.com wrote:

@tbreloff https://github.com/tbreloff package authors need to be responsible for dependency versioning. What features are you using, when things break how do you fix or work around them, etc. That comes with the territory of having dependencies. If you get any help you're lucky, but you can't expect other people to do this for you.

An outside-of-the-source copy of the dependency information may need to take priority here though, as in the existing system where metadata is used for registered packages, the package's copy of REQUIRE isn't actually used except at tag time to populate the initial content.

A compatibility-only revision release could be a mechanism for this, but it needs to be possible to do that for any published release, not just the latest within a minor series. Compatibility is about the rest of the world with respect to a fixed version of a package - we shouldn't be mixing the release numbering or resolution mechanism for outside-world compatibility within the same system (and constraints) that we use for a package's own source.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/Juleps/issues/14#issuecomment-261166373, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492h_v_zsbEfvzKw1j5Q2UwR_c06Fgks5q-_OagaJpZM4Kyxf_ .

StefanKarpinski commented 7 years ago

The notion that you can build a functioning ecosystem of reusable software without authors thinking about versioning at all strikes me as incredibly implausible, not to mention totally unscalable. Who's going to be spending all of their time figuring out how to version every single registered package? Your answer here seems to be "I dunno, but not me." If you want to develop software that way, that's cool – then don't register your packages. What I'm proposing will support unregistered packages much better, but it won't change the fact that following along with whatever happens to be on master on a set of packages will not be a good way to build systems that don't break all the time.

tbreloff commented 7 years ago

without authors thinking about versioning at all

Of course there's a middle ground. Authors think about the high level versioning, but not necessarily the gritty details (that frequently are due to other packages out of their control). Those details should either be handled by automation or by expert guidance, depending on the situation.

Your answer here seems to be "I dunno, but not me."

When it comes to curated metadata repos, if I'm not a curator then the final responsibility is not mine. Package authors can guide versioning (and should be encouraged to do as much as possible themselves) but this mentality that curators should never make changes to the thing they're curating, but instead to enact social pressure on package authors until they make the exact change that the curator could have done in the first place... it's just stupid. I want to see the curation as disjoint from the code.

following along with whatever happens to be on master on a set of packages will not be a good way to build systems that don't break all the time.

I couldn't agree more, which is why I care so much about making it dirt-simple to "do the right thing".

JeffreySarnoff commented 7 years ago

@StefanKarpinski @tbreloff. Each of you is right, In important measure

I have seen the need for handholding in the less well traveled regions of the deep end of the pool. increases superlinearly. @tkelman The work you do helping us deal with tags and git when it goes on a bender probably is more informative than predictive.

This Summer and next Fall I expect for Julia a flood of new and very active involvement. Something is going feel the extra weight. :walkingman: (mmph, :cry:) "I do not want to play with git" (:cry:, mmph)_

between update and upgrade. ?uplift

simonbyrne commented 7 years ago

Perhaps it would be useful to gather some data.

JeffreySarnoff commented 7 years ago

@tkelman Do you recall any of my chained missteps?

JeffreySarnoff commented 7 years ago

@simonbyrne I can share some subjective sense of what went wrong on a few occasions. I don't know how to try finding the events and extracting the file changes. By far the worst experience with git was not about tags, I tried to prepare Julia's source for deprecating symbol in favor of Symbol. I had put in the time and all the alterations were ready, and, as I recall, passed testing. Before the changes happened, someone suggested one other change to include. It was a legit request, but it was not one more of the changes that I had made work. All I remember after that is frustration building, many attempts to get what had been ok to become ok and just as many failures. Then someone else took on the task.

With tags, more than once a delay to adjust something minor, has been enough to drift away from METADATA prime and things get out of sync -- I have found the additional doings that entails so everything is back in sync and there is no residual issue/renaming/omission not intuitive and different from what goes on absent needing to readjust something. That does not work for me. And I no longer try to make it work. Instead I will erase all relevant forks, detag one or more tags and try again.

I have gotten the local tags and the github tags to be incongruous twice without much idea of how. At some point, with frequent pulling or pushing as appropriate the remote had tags through 0.1.2 and the local through 0.1.8. I had to unmake and remake them and I am not confident it really is all fixed.

StefanKarpinski commented 7 years ago

Actually, I've considered requiring that registered packages give admins of the registry in question commit access so that they can fix things as necessary, but that's not really a package manager design choice. Alternatively, since the Pkg3 design makes it possible to have multiple sources for a package, curators can have forks of any packages and tag versions on their forks. I'm not sure what else you've got in mind, @tbreloff? Are you advocating for taking compatibility information out of the repo entirely? That basically makes using unregistered packages impossible, which to me is the wrong direction.

JeffreySarnoff commented 7 years ago

+1 "give registry admins commit access"

tkelman commented 7 years ago

If the goal is to treat registered and unregistered packages the same (the logic in Pkg2 is made more complicated and error prone because of the different treatment), then I have an idea that doesn't require changing the way we do version resolution, and doesn't try to make compatibility info immutable.

What if there is no such thing as an unregistered package at all, but a package can contain its own registry info as part of the same repository? It would just need its own information, so basically the same data as Config.toml but in an append-only registry history format instead of as git revisions. Registries are living history records that are designed to live on master (or tip of a registry-vN branch). As long as the files are disjoint between what a registry stores (just a single file for a single package, presumably) and the package source code, then couldn't they come from the same repo? If the redundancy of having 2 clones at different sha's and used for diifferent purposes bothers anyone, they can split the registry file to a different repo.

StefanKarpinski commented 7 years ago

So what is associated with the version? Some subtree of the repo excluding the version metadata?

tbreloff commented 7 years ago

I think @tkelman is on the right track. One thing that I think would be very valuable is if tags didn't have to apply only to a single package, but could actually refer to a set of commits for related packages. This would make it easier for authors (like me) to say "here's the new version of this ecosystem". It would also greatly improve the dependency resolution problem, as groups of packages would be seen as one unit when doing resolution. All of this depends on tagging being separate from the repo contents.

tkelman commented 7 years ago

I need to reread whether the registry format is fully specced out here yet. I'm imagining something like a Registry.toml or JuliaRegistry.toml at top level that contains a list of package names and paths to individual detail files (to support directory sharding for large registries). For a "self-registered" package that contains its own registry info, there would only be one package name and version details file.

The release commit of such a package would have to have its own compatibility info in Config.toml but can't save its own hash to its registry history until after the release tag, but since registries don't usually operate from tags I don't think that's a problem.

JeffreySarnoff commented 7 years ago

I think this thread is on the right track. @tbreloff envisions a new tag power¹. The capability to tag with designated scope requires that tags be scope associated. Let's ensure that multiscopic tags play well with tags now in use: The absence of associated scope is not absence of scope, it is package scope.

This would give Julia a comparative advantage. I recommend defining tag scope to be yet more plyable. Leveraging the internals of Julia's type system to support scope as scoping over MaybeScope (simple scope | superscope | set of scopes | nothing) and to ensure there is a ground state. Use URIs.

¹ (Do you prefer yesterday's tags? Have you heard the all the new tags come cold brewed, you'll love their refined taste!)

StefanKarpinski commented 7 years ago

I have to say I think this is not the right direction at all. Under this "external versioning" proposal, the source repo for a project would have no information about what other packages or libraries it depends on. Instead, you need to check out the project and then find its registry – if there is one – in order to be able to even know what other packages it needs.

Tagging multiple packages with a single version adds yet another layer of dependency and complexity to an already complex system. If packages are all being versioned together in lockstep, why are they separate packages in the first place? This can already be accomplished by just giving all the packages the same version number – then tell people to use the 2.3.1 version of all of the packages.

JeffreySarnoff commented 7 years ago

@StefanKarpinski I did not read that the emphasis was on "external" -- my comment was in response to a [possibly imagined] proposal to allow tags to tag other tagged things in what seemed to be an elegant and easily expressed way. :-1: Complete decoupling of a thing from its constitutive self.

tbreloff commented 7 years ago

@StefanKarpinski too bad... I was hoping to be able to trash MetaPkg when Pkg3 was released, but it seems like it will still be needed. I suppose tooling to jointly test/tag/publish as well as auto-adding version dependency limits is all we need... it doesn't necessarily need to be supported at the Pkg3 level.

JeffreySarnoff commented 7 years ago

Sam Boyer thought so you want to write a package manager of interest.

tkelman commented 7 years ago

Versioning info would be present in the repo, but that copy wouldn't be the definitive source - only the best effort using information available at the time of tagging. Compatibility info needs to be possible to amend in order to fix any mistakes, or update past source releases for new information that was not available when they were tagged. I don't think we should force a new source release to update outside-world compatibility information - that makes it difficult to depend long-term on anything more than a single release per minor series. Compatibility is stored separately from the source in registries anyway, and amending it should be possible without touching a source release.

Multiple packages, if they depend on all being specific version numbers, should be expressed by compatibility bounds.

StefanKarpinski commented 7 years ago

A few observations:

  1. It’s confusing for the checked out source version of a package to say one thing about compatibility while the registry for the package says something else.
  2. When a package’s source is checked out somewhere, we should apply any registry updates to its config file.
  3. It will be annoying if this is done in a git repo and the file is being tracked by git since that will make it dirty – the changed version should be committed.
  4. Since this commit exists in our heads, it should also exist in the actual source repo.
  5. Even if there isn’t an actual commit for it, when you update the compatibility of a version, a new commit is implied – whether we actually make it or not. If we canonicalize config files, this commit has a completely predictable tree hash. It seems generally less confusing for the commit to actually exist rather than for it to only exist in our heads.
  6. Updated version of compatibility information for a package should be upstreamed back into the package source anyway, so that when future versions are tagged, they also include those changes. In other words, having this commit in the git repo isn’t just less confusing, it’s also useful/necessary for package development.

All of this points to creating and upstreaming source repo commits that correspond to modifications of each version’s compatibility information, instead of just making the changes in a registry. Moreover, the new commit with updated config info is what you should check out as the source of the package.

What about evolving the compatibility metadata of a version in-place in its registry without changing what we call that version? We do this now – so what’s the problem?

  1. Pkg3 will support multiple registries.
  2. Different registries can provide additional versions packages as long as they agree on the ones they have in common. This allows a private registry to make a tentative patch version (e.g. “v1.2.3+hotfix”) and use it before an official fix has been upstreamed to the main registry.
  3. If we just modify version metadata in-place in registries without changing version names, how is this supposed to work? What happens when two different registries have different metadata associated with a particular version? Which one do we use? How do we know which one is newer?
  4. When the package manager decides to use a particular version of a package and records that in an environment, we want to know which version of its metadata was in effect at the time, so that we can at least tell, after the fact, whether the choice was valid according to metadata of all of the versions chosen at the time. (There may be valid reasons ignore compatibility constraints and use a version anyway.)
  5. There are various ways to distinguish in an environment, which version of a version we’re using, but they are all equivalent to giving it a new name.
  6. I’ve proposed calling the version revision of source version v1.2.3 with updated compatibility, v1.2.3+1. Let’s go with that for the sake of argument.
  7. Remember that commit that’s implied by any update to a version’s compatibility information? Yeah, that one. The one that should probably exist in the source repo. If we’re calling that “v1.2.3+1” in the package manager and in environment files, and there’s a corresponding commit, then we should probably propagate that tag back to the source repo so that git also calls it “v1.2.3+1”.

Taken with the above points, this all leads us to one thing: immutable versions, with immutable compatibility, but with compatibility updates expressed as version revisions, e.g. v1.2.3+1, and these updates are upstreamed to the original source repositories as appropriately tagged commits, merged back into the relevant release branches.

Aside on tagging. Yes, version tagging in Pkg2 is a nightmare. It was a design mistake to tag versions before they are accepted into registries. We won’t repeat that design mistake in Pkg3 – version tagging will flow from the registry to the source repo, not the other way around. Any arguments about the annoyingness of tagging versions stems from this, not some fundamental problem with the concept of having git tags that correspond to versions, which is handy once they’re correct since it means that git knows what we call these things.

StefanKarpinski commented 7 years ago

There also seems to be some semantic confusion in this thread that I'd like to address:

Compatibility info needs to be possible to amend in order to fix any mistakes, or update past source releases for new information that was not available when they were tagged

No one is arguing that we will declare version compatibility once and for all and it will be correct and perfect forever. That's totally impractical – there's a reason that I made metadata mutable in Pkg1/2. What I believe should be immutable is the association between a particular package version and the claims it made about compatibility at the time. Even if this is immutable, compatibility can still be updated, just not by rewriting history, but instead by adding new information that supersedes old information.

StefanKarpinski commented 7 years ago

@tbreloff: regarding snapshots of entire sets of related packages, in the Pkg3 design you can just configure your repos so that commits include sufficient environment information like specific package versions and/or source tree hashes. That way people following along can just check out those exact versions instead of depending on normal version resolution. As long as you make sure that tests pass and you've committed your latest environment, people should be able to easily reproduce the exact same working set of package versions. This approach will work much better for rapidly moving collections of closely related packages like your Plots stuff or the Kenoverse.

tkelman commented 7 years ago

I don't think your observations 1-5 are all that major. 6 is often the case but not guaranteed - sometimes a compatibility adjustment for a past source version isn't relevant to newer source versions.

It may be less confusing to give these compatibility-only modified releases new names, but anything that replaces a past version in dependency resolution should be enforced as having the same source, otherwise this mechanism is a shortcut around immutable source releases. If we're going to have this path for republishing replacement versions as entirely new entities, we'd need to enforce somehow that these versions are only originated from registry compatibility adjustments and only modify compatibility, rather than potentially allowing arbitrary source changes. If you allow arbitrary source changes in an update that replaces a past version in dependency resolution, then that's essentially equivalent to unpublishing the past release tag which isn't good for reproducibility.

You're right that the state of the registry information plays into reproducibility of the compatibility state, so maybe that should also be recorded rather than trying to find ways to avoid having to think about it. But I think it would be more predictable and invite fewer opportunities for subverting the system if compatibility versions were specified as purely virtual registry-generated entities, guaranteed to be derived from an existing base release. Corresponding modifications and upstreaming of the package config.toml in place can be optional, and it's not always necessary or appropriate for a package developer to incorporate such changes in all future versions. The content of a compatibility revision shouldn't be allowed to be any arbitrary thing submitted by a package developer, it should be constrained. Tracking the information separately is a way of accomplishing that by design, and I don't see it as all that confusing or problematic. After all, other package names and versions in any statement of compatibility are already implicitly with respect to some registry.

tkelman commented 7 years ago

Since other package names and versions are meaningless without taking into account the registry that tells you what those package names and versions correspond to, it's a bit "wrong" to store a package's compatibility info within its source. It's implicitly a representation of what the future registry entry is going to say about that package, and gets ignored for most other purposes. Maybe we can think a bit about whether this system really makes sense. Right now REQUIRE is used to save state and remove the need to type out its entire content every time you make a tag, but it's sort of being stored and tracked in the wrong place. Depending on how development for packages is supposed to work in Pkg3, maybe we could change where we keep this information for under-development versions of a package.

StefanKarpinski commented 7 years ago

@tkelman: I don't think your observations 1-5 are all that major.

This sort of response is not constructive. This is your opinion backed by zero argument or discussion. Your feedback on this particular issue so far largely amounts to "I want it to continue to work the way it does now." When I make a carefully broken down argument for why we can't / shouldn't do that, it's not for my sake, it's so that we can have a constructive debate and zero in on specific points where there are concrete problems to be avoided.

Enforcing compatibility-only updates isn't exactly rocket science: don't allow version v1.2.3+1 to be registered if it differs from v1.2.3 in terms of anything but compatibility. Your stated preference for purely virtual versions is unconvincing – I made a thorough, multi-point argument for non-virtual versions being less confusing, more usable and more practical for package development, and you did not refute any of it, just dismissed all the points as "not all that major". Let's play this one again... Whether they exist in version control or not, compatibility updates correspond to predictable commits that could be materialized. So should they be materialized or not?

Why wouldn't we materialize them? The only reason is that they could change source – which we can easily verify that they don't upon registration. All of this points to having having compatibility updates being actual, not virtual.

Tracking the information separately is a way of accomplishing that by design, and I don't see it as all that confusing or problematic. After all, other package names and versions in any statement of compatibility are already implicitly with respect to some registry.

This is no longer true in Pkg3. By introducing multiple registries, some of which are private, the registration system necessarily becomes distributed, federated and not globally visible. Package UUIDs give packages identity independent from registries – even unregistered ones. Packages can and will move between registries (e.g. from uncurated to curated or from private to public), and it is possible to depend on packages in other registries. The registry cannot be the determiner of package or version identity anymore. Having each package's version history in its repository may be a good idea, although that inforformation would be redundant with git tags, so maybe not.

simonbyrne commented 7 years ago

version tagging will flow from the registry to the source repo, not the other way around

How do you envision that this would work?