NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.92k stars 13.95k forks source link

Get rid of md5 support for fixed-output derivations #4491

Closed domenkozar closed 7 years ago

domenkozar commented 10 years ago

We're in 2014 and even universities have a course where students forge md5 hashes of files.

Biggest usage of md5 hashes in nixpkgs is python, followed by libreoffice (scripted install).

$ git grep "md5 ="|grep -v libreoffice | grep -v python-|grep -v redhat|grep -v suse|wc -l                                                                                                        
141

$ git grep "md5 =" pkgs/top-level/python-packages.nix | wc -l
273

Observations:

nix-prefetch- should print out multiple hashes together with fetch functions supporting and verifying all of specified hashes

Q/A:

it is considered best practice to use it when that's what upstream provides

That's a very bad security practice. It trades user security for few seconds of maintainer time.

TODO

robberer commented 10 years ago

This is not as trivial as i thought because nix-prefetch-url does not check if the downloaded archive is a valid archive. e.g. my update for cups is invalid because nix-prefetch-url just downloaded a HTML file instead of an tar.bz2. i will prepare a fix. At least for cups. Or should we revert commit ?

domenkozar commented 10 years ago

Fixing cups is OK.

copumpkin commented 9 years ago

:+1:

copumpkin commented 9 years ago

Can we broaden this to also deprecate sha1 for similar reasons? It's not as broken as md5, but it's also not considered particularly good when used for security purposes.

copumpkin commented 9 years ago

Perhaps fetchurl and friends could even display a warning when passed md5 or sha1s that tells you that the practice is deprecated and give you a sha256 to replace it with in the expression.

7c6f434c commented 9 years ago

Can we broaden this to also deprecate sha1 for similar reasons? It's not as broken as md5, but it's also not considered particularly good when used for security purposes.

Unfortunately, some projects still publish signed lists of SHA1s and some Nix package maintainers seem to use these lists.

Another problem is that we don't seem to support SHA512: Firefox has official SHA1 list and SHA512 list.

copumpkin commented 9 years ago

Perhaps we could then include both, making the sha256 attribute mandatory? The sha1/md5 can be used to check against upstream lists, whereas the sha256 serves to give you certainty that what you're deploying is what you think it is.

7c6f434c commented 9 years ago

Perhaps we could then include both, making the sha256 attribute mandatory? The sha1/md5 can be used to check against upstream lists, whereas the sha256 serves to give you certainty that what you're deploying is what you think it is.

For firefox-bin the whole feasibility of including all the languages hinges on not downloading all the packages but using Mozilla-provided checksum lists.

vcunat commented 9 years ago

I think forbidding sha1 is a huge overkill ATM. In most packages there's even no way of securely retrieving the hash/tarball (no signature or https at least).

If I wanted to attack nixpkgs, I'd become a contributor, so I'd get commit access relatively easily, and then I could sneak in whatever forged hashes I needed (unlikely to be noticed if done well).

copumpkin commented 8 years ago

https://github.com/NixOS/nix/issues/802

vcunat commented 8 years ago

ATM the largest amount of md5 usage is from the new texlive, as its list of tarball hashes is taken from upstream and they only provide md5.

However, the transformed fixed-output derivations use sha1 already, so md5 isn't really used by regular users.

7c6f434c commented 8 years ago

ATM the largest amount of md5 usage is from the new texlive, as its list of tarball hashes is taken from upstream and they only provide md5.

However, the transformed fixed-output derivations use sha1 already, so md5 isn't really used by regular users.

We also have LibreOffice dependency list which is MD5-from-upstream.

copumpkin commented 8 years ago

It sounds like TeXLive is getting with the times: https://www.preining.info/blog/2016/01/tex-live-security-improvements/

copumpkin commented 8 years ago

@7c6f434c LibreOffice's main download page took me here: http://download.documentfoundation.org/libreoffice/stable/5.1.0/mac/x86_64/LibreOffice_5.1.0_MacOS_x86-64.dmg.mirrorlist which seems to provide a fine sha256.

vcunat commented 8 years ago

Yeah, LibreOffice provides those now; note that they also serve the hashes over https URL.

7c6f434c commented 8 years ago

@7c6f434c LibreOffice's main download page took me here: http://download.documentfoundation.org/libreoffice/stable/5.1.0/mac/x86_64/LibreOffice_5.1.0_MacOS_x86-64.dmg.mirrorlist which seems to provide a fine sha256.

It has around a metric ton of inner dependencies that are built in some special ways inside LO build. They are listed in downloads.lst — with md5 sums. And they are, ahem, quite numerous. Some of them have been factored out, but I remember returning some of them back into the LO build becuase of heavy breakage.

copumpkin commented 8 years ago

Has anyone raised the issue upstream to see if they can just switch that process over to a more sensible hash? On Mon, Feb 15, 2016 at 15:12 Michael Raskin notifications@github.com wrote:

@7c6f434c LibreOffice's main download page took me here: http://download.documentfoundation.org/libreoffice/stable/5.1.0/mac/x86_64/LibreOffice_5.1.0_MacOS_x86-64.dmg.mirrorlist which seems to provide a fine sha256.

It has around a metric ton of inner dependencies that are built in some special ways inside LO build. They are listed in downloads.lst — with md5 sums. And they are, ahem, quite numerous. Some of them have been factored out, but I remember returning some of them back into the LO build becuase of heavy breakage.

— Reply to this email directly or view it on GitHub https://github.com/NixOS/nixpkgs/issues/4491#issuecomment-184368970.

grahamc commented 8 years ago

FWIW: I submitted a (long-merged) PR reducing the # of md5s down from 273 to 13, and sha1s from ?? to 0. I implemented it by taking the current URLs, fetching them, comparing md5s, then updating to sha256 only if they updated.

pSub commented 8 years ago

The situation of non-python packages actually got worse (compared the the initial values):

$ git grep "md5 ="|grep -v libreoffice | grep -v python-|grep -v redhat|grep -v suse|wc -l
516

Edit: It seems that is mainly because of pkgs/games/steam/runtime-generated.nix.

vcunat commented 8 years ago

texlive 2016 replaces md5 by sha512 (!) WIP: https://github.com/NixOS/nixpkgs/pull/16391

domenkozar commented 8 years ago

The attack is so simple we even use it in nixpkgs(!): https://github.com/NixOS/nixpkgs/blame/5b61d977012f10f8a54d9f42124a833cbcd4fbde/pkgs/applications/networking/browsers/chromium/update.nix#L103-L106

md5 will have to go in 17.03

$ git grep "md5 ="|grep -v libreoffice | grep -v python-|grep -v steam|wc -l
114
domenkozar commented 8 years ago

Filed a bug to libreoffice to move away from md5: https://bugs.documentfoundation.org/show_bug.cgi?id=102163

domenkozar commented 8 years ago

cc @garbas for

pkgs/development/tools/pypi2nix/deps.nix:    md5 = pipWhlHash;
pkgs/development/tools/pypi2nix/deps.nix:    md5 = setuptoolsWhlHash;
pkgs/development/tools/pypi2nix/deps.nix:    md5 = pipHash;
pkgs/development/tools/pypi2nix/deps.nix:    md5 = setuptoolsHash;
pkgs/development/tools/pypi2nix/deps.nix:    md5 = zcbuildoutHash;
pkgs/development/tools/pypi2nix/deps.nix:    md5 = zcrecipeeggHash;
pkgs/development/tools/pypi2nix/deps.nix:    md5 = buildoutrequirementsHash;
pkgs/development/tools/pypi2nix/deps.nix:    md5 = wheelHash;
pkgs/development/tools/pypi2nix/deps.nix:    md5 = clickHash;
pkgs/development/tools/pypi2nix/deps.nix:  #   md5 = sixHash;
pkgs/development/tools/pypi2nix/deps.nix:  #   md5 = attrsHash;
pkgs/development/tools/pypi2nix/deps.nix:  #   md5 = effectHash;
pkgs/development/tools/pypi2nix/deps.nix:    md5 = requestsHash;
garbas commented 8 years ago

@domenkozar will fix this with next release of pypi2nix. should be out there shortly after this weekend.

7c6f434c commented 8 years ago

Thanks to @chris-martin for #18802, LibreOffice inner dependencies are no longer a problem.

domenkozar commented 8 years ago
$ git grep "md5 ="| wc -l
91
7c6f434c commented 8 years ago

@domenkozar Just in case: LibreOffice will keep having md5 = in the expression, just because LO build system assumes some tarballs include MD5 hashes in their names and some don't, so it is simpler just to create both symlinks for all of the tarballs. MD5 hashes are only used for verification during the package update process, though.

domenkozar commented 8 years ago

@7c6f434c roger.

$ git grep "md5 ="|grep -v libreoffice|wc -l
75
7c6f434c commented 8 years ago

I apply for more exemptions.

  1. I have fixed the only non-obsolete voluntary mention of md5 in documentation (beam uses MD5 in the output independently of our uses). Should we just ignore the mentions of md5 in doc/old/cross.txt and grep inside pkgs/?
  2. lispPackages has md5 package. The same happens with Chicken Scheme egg list. Also, Haskell has apache-md5.
  3. Is it a good idea to start printing deprecation warnings from fetchurl when md5 is used? Maybe fetchgit should just drop md5 support and gain sha512 support instead…
7c6f434c commented 8 years ago

Just a notice for everyone: #19334 is a PR to issue trace deprecation warnings whenever md5 is used in NixPkgs in fetch*. I will merge it myself next week unless there are objections or suggestions of large changes.

garbas commented 8 years ago

@domenkozar md5 -> sha256 fix for pypi2nix is already in master

domenkozar commented 8 years ago

Deprecation logic is also in master, now we just need changelog and fix for ~65 packages and this is done.

7c6f434c commented 8 years ago

And to have the reminder everywhere — when MD5 is gone, the exception in make-tarball.nix (MD5 warnings are currently allowed, any other trace output breaks Hydra evaluation) should be removed.

grahamc commented 7 years ago

Remaining:

pkgs/applications/office/libreoffice/still.nix:    third_party = [ (let md5 = "185d60944ea767075d27247c3162b3bc"; in fetchurl rec {
pkgs/build-support/vm/windows/cygwin-iso/mkclosure.py:        url, size, md5 = install_line.split(' ', 2)
pkgs/build-support/vm/windows/cygwin-iso/mkclosure.py:            '    md5 = "{0}";'.format(md5),
pkgs/development/haskell-modules/configuration-common.nix:  apache-md5 = dontCheck super.apache-md5;              # http://hydra.cryp.to/build/498709/nixlog/1/raw
pkgs/development/haskell-modules/configuration-hackage2nix.yaml:  - cryptohash-md5 ==0.11.100.1
pkgs/development/lisp-modules/lisp-packages.nix:  md5 = buildLispPackage rec {
pkgs/development/tools/egg2nix/chicken-eggs.nix:  md5 = eggDerivation {
7c6f434c commented 7 years ago

This is just a regular reminder that libreoffice-still uses md5 for a content addressed URL (there is a sha256 of the content in the expression, too, and MD5 hash is a part of the URL), and Haskell, Common Lisp and Chicken find it convenient to call packages implementing MD5 md5.

As for cygwin-iso, @aszlig is it supposed to be still usable?

I have done something about it, as the new setup.ini uses SHA512 anyway, but I cannot build it and I am not sure if I am calling it right. I am sure my changes are an improvement, though.

7c6f434c commented 7 years ago

This means that all remaining MD5 mentions in master are false positives.

domenkozar commented 7 years ago

We still need to implement deprecation (reverting https://github.com/NixOS/nixpkgs/commit/2ca883338389b7ab995924a0cab0211993bdf1da)

7c6f434c commented 7 years ago

Well, technically you want to revert the removal of warnings.

I think 2ca8833 has been effectively reverted in e3a873479eee4e19a852011d45b0fb653f6c9e89

Is the plan now to apply 717ff85b14074c1a01b6ce7275e6dc727094a05a once more?

@domenkozar do you have better ideas? @edolstra would you object to such a plan?

grahamc commented 7 years ago

What if we just deleted support for specifying md5 =?

7c6f434c commented 7 years ago

I think deprecation is supposed to annoy people with out-of-tree Nix packages to finally update with some time window, instead of abrupt disappearance of functionality?

(I do not have any personal interest in NixPkgs fetch* MD5 support, though)

copumpkin commented 7 years ago

We're going to need a new one of these for sha1...

vcunat commented 7 years ago

So signing git commits is useless now? EDIT: I just started doing so a couple months ago...

copumpkin commented 7 years ago

Not fully useless (it's collision not preimage), and the attack is still pretty expensive, but I expect cost to go down significantly soon given how much relies on SHA1 nowadays.

vcunat commented 7 years ago

Well, actually when we sign git commits, we certify rather just the changes and not the whole history. (Noone's verifying signatures on ancestors AFAIK.)

jpierre03 commented 7 years ago

On 3fdd726b16cf39da25dd97d30631dc129e33fb0f

Total count of "md5 =" in nixpkgs is huge.

$ git grep "md5 ="|wc -l
225

Libreoffice is top 1 project using md5.

$ git grep "md5 ="|grep libreoffice|wc -l
221

Projects without libreoffice

$ git grep "md5 =" |grep -v libreoffice|wc -l
4

$ git grep "md5 =" |grep -v libreoffice
pkgs/development/haskell-modules/configuration-common.nix:  apache-md5 = dontCheck super.apache-md5;              # http://hydra.cryp.to/build/498709/nixlog/1/raw
pkgs/development/haskell-modules/configuration-hackage2nix.yaml:  - cryptohash-md5 ==0.11.100.1
pkgs/development/lisp-modules/lisp-packages.nix:  md5 = buildLispPackage rec {
pkgs/development/tools/egg2nix/chicken-eggs.nix:  md5 = eggDerivation {

None of them is a md5 hash. Good !

jpierre03 commented 7 years ago

Should an issue for sha1 be opened ?

$ git grep "sha1 ="|wc -l
8839

$ git grep "sha256 ="|wc -l
39903

Node-packages are top 1 on sha1 ( https://github.com/svanderburg/node2nix/issues/39 )

$ git grep "sha1 ="|grep node-packages|wc -l
7562
globin commented 7 years ago

Libreoffice uses sha256 to check the download:

(map (x : ((fetchurl {inherit (x) url sha256 name;}) // {inherit (x) md5name md5;})) (import ./libreoffice-srcs.nix));

Example entry:

  {
    name = "libabw-0.1.1.tar.bz2";
    url = "http://dev-www.libreoffice.org/src/libabw-0.1.1.tar.bz2";
    sha256 = "7a3d3415cf82ab9894f601d1b3057c4615060304d5279efdec6275e01b96a199";
    md5 = "7a3815b506d064313ba309617b6f5a0b";
    md5name = "7a3815b506d064313ba309617b6f5a0b-libabw-0.1.1.tar.bz2";
  }

I don't see any remaining packages using md5 so I'd propose removing the support in the fetchers on master, fixing this in projects outside of nixpkgs is easy and shouldn't be used anyway.

Baughn commented 7 years ago

This bit me with Curseforge, where I'm generating nix code by scraping their download pages. They only provide MD5, and I don't want to download the files themselves; it'd take far too much bandwidth, and might get me banned.

Could you reconsider disabling md5, or maybe gate it behind a __YES_I_REALLY_WANT_THIS type flag? The only alternatives I can see are to fork fetchurl, or to disable hash checking entirely, and I'd prefer to do neither. I've tried asking them to provide a better hash, but it doesn't seem like that will happen anytime soon, especially as the service has no official API.

edolstra commented 7 years ago

@Baughn Nix's builtin fetchurl still supports md5, so you may be able to do:

import <nix/fetchurl.nix> { url = ...; md5 = "7f8b9cc5953a21f2e0fa58547db7bb6c"; }