[RFC] Structured patches

matthewbauer commented 6 years ago

I had this idea for improvement to "patches" attribute. Basically, I think it could be useful to store some meta information on patches. For instance,

stdenv.mkDerivation {
  name = "hello-1.0";
  patches = [
    { file = ./cve-555.patch; # actual patch file
      vulnerability = "CVE-555"; # CVE identifier
      version = "1.0.1"; # version this patch was included in
      url = https://github.com/hello/hello/issues/555; # discussion of patch
    }
  ];
}

Does this seem useful? It would make life easier for maintainers I think.

jtojnar commented 6 years ago

What kind of patches are there?
- Critical fixes until a new version is released
  - CVEs
  - critical errors (data corruption, crashes making software unusable)
  - build fixes (library compatibility)
- Nix specific patches (adding Nix store to sandbox paths)
- Potentially upstreamable patches needed for Nix (introducing a new environment variable)
- Other patches (evince recentListSize patch) – we probably should not have these in nixpkgs
When do I come across patches and what do I expect?
- When updating a package
  - For Nix specific patches: a detailed description of why it is needed
  - For other patches: upstream bug where can I track its merge status
- When cleaning up nixpkgs
  - The same but the need for understanding is stronger, maybe in-tree explanations for all the patches
- When reviewing a PR
  - Info that will make future maintenance easier, same as above
Where do patches come from?
- Upstream VCS/patches in bug tracker marked as pushed – these are probably safe and can be removed in the next update
- Upstream bug tracker, patches not yet merged – patch might not be safe, might not get merged, might not apply cleanly after update
- Downstream (us) – full control, probably should not be used for other things than Nix specific patches
- Downstream (other distributions) – no control but probably will be kept up to date, should be used only when there is no upstream source
What do we need structure for?
- CVE tracking
  - We should ask vulnix people if they want it
- Tool for assisted reviews
  - Probably not worth it, comments are good enough
- Enforced expectations as above
  - The hardest part is making sure what the patch does and that cannot be automated
  - Attribute checking might be relatively easy to implement to ofborg
  - If we want to enforce something, we should come up with a policy

To sum up this quick brain dump, I do not think it makes much sense for anything other than CVEs tracking. The structure would be useful for automatically checking if patches have info maintainers will need but policy will need to be prepared first.

Regarding the syntax, the following is quite nice:

{
  patches = [
    (fetchurl {
      meta = {
        description = "darwin linker arguments";
        bug = https://bugzilla.gnome.org/show_bug.cgi?id=794326;
      };
      url = https://bugzilla.gnome.org/attachment.cgi?id=369680;
      sha256 = "11v8fhpsbapa04ifb2268cga398vfk1nq8i628441632zjz1diwg";
    })
  ];

The cve field could also be a list.

Example policy:

patches field can contain only:

critical fixes
- obtained via fetchurl/fetchpatch from upstream issue tracker/VCS
- having meta.bug attribute pointing to bug report tracking the patch
- having meta.description attribute describing why the patch is needed
Nix specific patches
- local files
- preceded by a comment describing why the patch is needed
- for larger patches, brief overview would be also nice
Potentially upstreamable patches needed for Nix
- will need to be send to a upstream bug tracker
- obtained same as the critical fixes
No other types of patches are allowed
- If you want your document viewer to show more items in the menu, use override and patch it locally
- If upstream wants us to patch other projects and they did not send the patches upstream, they can go to hell

Issues:

What if there is no upstream bug tracker
- site down
- patches only accepted by a private e-mail
What if the patch does not apply?
- needs to be backported
People are lazy, sending all their patches upstream is extra work (but it will save us work in the long run)

Anton-Latukha commented 6 years ago

Also please document does we use patch URLs:

[ ] To the rolling master branch. So hash would give a clear error right away, if patch changed in it's upstream.
[ ] To the git tag tree state of the project release, where patch stored. To do switches when current stable release of project happen. In theory switching through stable states of patch.
[ ] To the commit of origin of the patch. So it would be fully static in our system. But then patch can be updated upstream, or even removed, but we would not notice anything. Until some project fails to build due to consequences of the old patch with an error that going to need troubleshooting.

I see this can be relevant for overrides, and also for Nix-related patches.

_{P.S.

I know that no one ever tracks changes/updates to patches. And old patches probably cause some percentage of problems in packages.

I personally see, ideally, that we integrate a patch updates process (for some types of them, like upon stable upstream release of the project that stores patch) into the update bot.}

jtojnar commented 6 years ago

@Anton-Latukha

We want the older revisions of nixpkgs to work too so using patches from a branch is no go.
Tags should be fairly stable but a commit hash is probably the safest. Here, I attempted to compile a list of acceptable patch sources. For static you use fetchurl, for dynamic fetchpatch:
- Immutable attachments on bug trackers, mailing lists, etc. (static)
- Immutable patches generated by web services, e.g. commit diffs (dynamic)
  - https://github.com/Gnucash/gnucash/commit/42ac55e03a1a84739f4a5b7a247c31d91c0adc4a.patch
  - https://git.gnome.org/browse/glib-networking/patch/?id=f1c8feee014007cc913b71357acb609f8d1200df
- Files uploaded to a VCS (static, immutable as long as pinned to a certain commit)
- Mutable patches generated by web services, e.g. pull requests (dynamic, probably immutable after merged)
  - https://gitlab.gnome.org/GNOME/json-glib/merge_requests/2.diff
  - https://github.com/jemalloc/jemalloc/pull/1035.patch
People updating a package need to check validity of each source (including) patches either way. We can only make it easier for them, which this RFC attempts to do.

I know that no one ever tracks changes/updates to patches. And old patches probably cause some percentage of problems in packages.

I check the validity of patches for each package I update.

I personally see, ideally, that we integrate a patch updates process (for some types of them, like upon stable upstream release of the project that stores patch) into the update bot.

Ideally, most of the patches would be sent upstream, so on update, we would just remove the patches. I do not see an excuse for generally usable patches to remain downstream. The only place where patch updating might make sense are terrible downstreams like Handbrake.

dezgeg commented 6 years ago

The OpenEmbedded project has a fairly nice policy on documenting the reasons of patches in the commit message of the patch: https://www.openembedded.org/wiki/Commit_Patch_Message_Guidelines#Patch_Header_Recommendations:_Upstream-Status. I suppose we could follow that.

I believe Vulnix already does CVE tracking of patches by scanning for CVE-20xx-12345 strings in the patch filename. I don't think that is documented though.

ckauhaus commented 6 years ago

vulnix (which is used to prepare the weekly vulnerability roundups has a very simple approach to metadata: security patches should contain a CVE identifier in their name (either filename or name attribute when using fetchpatch).

ckauhaus commented 6 years ago

@dezgeg It's documented in the vulnix-whitelist manpage: https://github.com/flyingcircusio/vulnix/blob/master/doc/vulnix-whitelist.5.md#automated-patch-whitelisting

dezgeg commented 6 years ago

But it's not documented in Nixpkgs I mean, or is it?

ckauhaus commented 6 years ago

No, as it is specific to vulnix right now and not official policy. I'd be glad if it became such one, though.

ckauhaus commented 6 years ago

There has been a similar discussion in #15660 (but without getting to a conclusion; issue still open)

Anton-Latukha commented 6 years ago

I think that we should migrate patches, and standardize on special function that automates the logic.

The main idea:

Is to direct people to use special function that requires by itself to provide:

Functions:
- nixPatch - Nix specific;
- cvePatch - ones that should be removed after CVE fix sees upstream release;
- tmpPatch - ones that fix current problem and should be removed in the future;
- permanentPatch - for old stalled projects that can/should be patched forever.
description value of what is patch is for, and when to remove it || cve).
url to fetch patch.
sha256.

Optional:

from-thread to where discussion on patch happened ...

Patches for stalled projects

From my experience of cleaning-up patches for HandBrake, it had a bundle of patches for dependencies that are old stalled projects. And some of that patches where very legitimate, so it is sane to apply them to those old projects forever.

So, for permanentPatches, I want to say - it is a good idea to have local copy of these patches in the tree:

These patches come from variety of places, and disappear from internet easily.
Old projects can gather number of patches. Patches on same files can conflict. We modify them to work together, and store them locally.
So on fix tag function should use locally stored file, but also require to provide a link to the thread it came from.

So, result is something like this:


# For patches that should hit upstream/be resolved in the future
tmpPatch {
  description = "Fixes this";
  until = "Until this happens";
  from-thread = "https://patchnet.com/lets-fix-this";
  url = "https://patchnet.com/this_is_solved.patch";
  sha256 = "sha256";
}

# For patches that cover CVEs
cvePatch {
  cve = "CVE-555";
  until = "Until this happens"; <- optional
  url = "https://patchnet.com/cve-555.patch";
  sha256 = "sha256";
}

# If we patch old project that still used today (like `ffmpeg 0.10` used by `spotify`, `a52dec` used by `handbrake`...).
permanentPatch {
  description = "Fixes this";
  file = "./patch-good-old-project.patch";
  from-thread = "https://fidonet.ozzmosis.com/echomail.php/c_plusplus/25ff8d7f8ff9ff50.html";
}

Since specific patches have according specific functions, that automagically requires to provide according form and information. The review process and maintenance of patches become easy, and we can do drive-by maintenance of patches.

To migrate we can use placeholder values and set all patches as tmpPatch {}. And from that time people would populate information and maintain patches on the go.

ckauhaus commented 5 years ago

It would perhaps be a good idea to convert this one into a "real" RFC in https://github.com/NixOS/rfcs

Anton-Latukha commented 4 years ago

In discussions, I received an argument that patches do not need a description - "because what patch does is self-evident". This argument expects everyone to knows all programming languages and all subsystems and all patches and what upstream bug patch used to solve, so gave a logical refute proof of this argument.

For me that some people are not describing what patches are about - is very ignorant behavior.

Mandatory description of what patch is about/or upstream issue/PR (as I put as an argument in above comment
Commenting a patch is the strongest argument now, because krank util now allows to autotrack status of mentioned issue/PR, are they relevant (open). So - patch relevance can now be automatically tracked, if they have a description. That is also an important feature for automatically tracking security patches.

peti commented 4 years ago

I really like this idea and wholeheartedly support it. It would be great if someone could find the time and enthusiasm to create an RFC suggesting this change to the patches field.

stale[bot] commented 4 years ago

I marked this as stale due to inactivity. → More info

nrbray commented 1 year ago

Anton-Latukha

From...

Quote:

The ability of NixOS/Nix/Nixpkgs to be the best security distribution around depends on the existence of the structured patches.

Since CVEs are always fixed first in the form of a patch application, before upstream fixes it & makes a release with the fix.

If there is no automation to apply a patch, no way to in the standard way to supply all required information to demark it what it is for, no way to scan for what patches are to be applied & what CVE patches are no longer needed, since package got a release.

That is possible only with a structured patches approach.

https://github.com/NixOS/nixpkgs/issues/39392

IDK of recent processes in Nixpkgs, but if structured patches got implemented & are enforced - that would transform the situation, Nixpkgs would become one of the most secure distribution software stores by design.

NixOS / nixpkgs

[RFC] Structured patches #39392

The main idea:

Patches for stalled projects