NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.57k stars 13.73k forks source link

Use Software Heritage as a fallback download location #53653

Open armijnhemel opened 5 years ago

armijnhemel commented 5 years ago

Issue description

In the past download locations for software have disappeared for various reasons: source releases were deleted, domain name registration expired, website redesign, and so on. This has led to packages failing to build.

Although there are sometimes mirrors of source code these might come with the same issue, for example when they are exact clones of the original site.

It might be good to use Software Heritage as a fallback download location, or at least as a source in case we want to provide our own mirror of the source code that disappeared from the original site. Currently Software Heritage already indexes everything from Debian, GitHub, GitLab, PyPI, Google Code, GNU and several others and make it available under a unique hash

Steps to reproduce

None

Technical details

None

armijnhemel commented 5 years ago

Adding @edolstra

edolstra commented 5 years ago

Do they have a content-addressable mirror?

armijnhemel commented 5 years ago

Yes, you can get content from SH using various checksums. There is rate limiting in place though (but for for example Hydra that could possibly be lifted if needed).

davidak commented 5 years ago

I like the idea! Do they support such usage of their system?

armijnhemel commented 5 years ago

My guess is that it depends on how we implement it and we would need to talk to them about it. My guess is that if we do it just as a backup (as soon as the original is no longer available) and first limit it to Hydra that it would be no problem for Software Heritage to lift the rate limit.

armijnhemel commented 5 years ago

Guix implemented it recently: https://lwn.net/Articles/784401/

asymmetric commented 5 years ago

And a direct link to the blog post.

makefu commented 5 years ago

Imho "Debian, GitHub, GitLab, PyPI, Google Code, GNU" are not the pain points we have with nixpkgs right now. The biggest issue i see are mirrors of nonfree software such as Oracle, Nvidia, AMD or dropbox. Sources like that are no issue for GUIX due to the free nature of the repository packages.

I really like the idea of mirroring all the sources we use but maybe we can get help from other projects or technologies (ipfs, archive.org) which do not have the restriction to only mirror free source code.

davidak commented 5 years ago

@makefu right. for me, steam, amd and adobe flash are the only issues of this kind. I think most of the time it would be a licensing issue to redistribute nonfree software.

But it would still be very elegant to use Software Heritage in the longterm perspective. It might saves you some day. Imagine you want to open an obscure file format popular today in 50 years. When we have implemented it NOW, you will be able to use todays nixpkgs then.

That's two different problems to solve.

asymmetric commented 5 years ago

Imho "Debian, GitHub, GitLab, PyPI, Google Code, GNU" are not the pain points we have with nixpkgs right now.

@makefu They are not the problem right now, but they're guaranteed to be one in the future, especially the commercial companies in that list.

Given that all of those repositories will eventually go down, isn't it better to have as fallback an institution that explicitly has the purpose of long-term conservancy? OTOH I agree that when the problem presents itself, we can implement a SH backend, so maybe this doesn't need our attention now.

I agree that a decentralized storage layer like IPFS would also be very interesting. But maybe that's the topic for another issue?

makefu commented 5 years ago

I am totally for implementing software heritage as backend, i just wanted to point out that the things currently mirrored are only a subset of the sources we have (and break) in nixpkgs.

The nice thing with the current setup in nixpkgs is that we can just add more options.

davidak commented 5 years ago

I agree that a decentralized storage layer like IPFS would also be very interesting. But maybe that's the topic for another issue?

It is. https://github.com/NixOS/nix/issues/859

seirl commented 5 years ago

@makefu :

I really like the idea of mirroring all the sources we use but maybe we can get help from other projects or technologies (ipfs, archive.org) which do not have the restriction to only mirror free source code.

We do not have that restriction in Software Heritage. You're free to deposit code with a non-free license. It's just easier for us to mirror the main forges in priority, because that's where most of the code is available.

makefu commented 5 years ago

@seirl thanks for the reply! That is great to hear. Is it also possible to mirror blobs (e.g. VirtualBox Extensions or Adobe Flash Player binary)?

seirl commented 5 years ago

@makefu It's technically possible, but it's not the intent of Software Heritage to mirror binaries. We won't filter them, but be aware that there are size restrictions that will apply. If it's for the exception rather than the norm (e.g, one package that vendors a small proprietary .so) it seems totally fine, but please don't use Software Heritage as a binary cache :-)

Cheers.

seirl commented 5 years ago

Just to add, the size restriction is currently at 100 MiB, but it's not a hard guarantee and could change in the future. We expect most of the source files deposited to be way under that.

makefu commented 5 years ago

@seirl thanks for clearing that up, i am sure more people are interested in this response as well :+1:

Zimmi48 commented 5 years ago

Imho "Debian, GitHub, GitLab, PyPI, Google Code, GNU" are not the pain points we have with nixpkgs right now.

Just because GitHub is not expected to go down anytime soon doesn't prevent people from deleting their repository. Oh, and it also regularly happens to GitHub to be down / have troubles for a few hours.

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-fetch-lfs-enabled-repo-with-fetchfromgithub/5890/8

stale[bot] commented 4 years ago

Hello, I'm a bot and I thank you in the name of the community for opening this issue.

To help our human contributors focus on the most-relevant reports, I check up on old issues to see if they're still relevant. This issue has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.

The community would appreciate your effort in checking if the issue is still valid. If it isn't, please close it.

If the issue persists, and you'd like to remove the stale label, you simply need to leave a comment. Your comment can be as simple as "still important to me". If you'd like it to get more attention, you can ask for help by searching for maintainers and people that previously touched related code and @ mention them in a comment. You can use Git blame or GitHub's web interface on the relevant files to find them.

Lastly, you can always ask for help at our Discourse Forum or at #nixos' IRC channel.

davidak commented 4 years ago

This is still not merged.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

asymmetric commented 3 years ago

AFAIK there's some ongoing work on this here, although I'm not sure of the current status. /cc @nlewo.

See also this discourse post.

nixos-discourse commented 3 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/reproductibility-with-nix-flake-and-r-langage/13346/1

rgrunbla commented 3 years ago

Regarding this, you can find the archives generated by hydra every day ( see https://nix-community.github.io/nixpkgs-swh/ ) encoded as "branches" on this repository of the software heritage website: https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://nix-community.github.io/nixpkgs-swh/sources-unstable.json

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info

nixinator commented 1 year ago

did the fallback ever get merged? maybe i missed a commit.

https://www.tweag.io/blog/2020-06-18-software-heritage/

nlewo commented 1 year ago

did the fallback ever get merged? maybe i missed a commit.

AFIAK, the fallback has never been implemented (only some partial prototypes).

progval commented 1 year ago

By the way, Nixpkgs currently can't be archived by SWH because of https://github.com/nix-community/nixpkgs-swh/issues/5

christoph-blessing commented 2 months ago

I stumbled across this when looking into combining the source code from SWH with the build instructions from nixpkgs for a work project but it seems to be abandoned. Does anyone know what the current status is?