deu / palemoon-overlay

Unofficial Gentoo overlay for the Pale Moon (http://www.palemoon.org/) web browser.
34 stars 12 forks source link

SRC_URI instead of git fetch #53

Open Ud71p opened 6 years ago

Ud71p commented 6 years ago

Hi, and thanks a lot for great PaleMoon ebuilds. I use them all the time, and are very happy, except one small detail - use of git fetching to get the sources. I always have to modify the ebuild to change to the standard portage fetch.

Here are some reasons why standard fetch (using SRC_URI) is better than git-r3_fetch:

Most reasons stem from this super-reason:

Verifying ebuild manifests

Package manager (e.g. Portage) will check the hashes of all downloads. This does not happen for git fetches.

Hashing prevents corruption of data.

The biggest win of hashing is security. It's important all users and devs build the package from the same source. Otherwise some user under attack can fetch sources modified to contain malware, and this will never be detected.

Another win is quality assurrance. It's imperative that everybody, e.g. the testers of unstable version, and later users after stabilization, all use exactly same sources. With version control, people can do changes and re-release under same tag/version. This of course shound be forbidden, but I have seen this happen. Hashing simply prevents this.

I also believe the size of a tarball release is smaller than git clone.

Also git fetches screw up the whole mirroring system. Mirrors are great for many things, but only work with standard SRC_URI fetches.

Also SRC_URI is better because then emerge -a correctly calculates and informs the user of the download size.

SRC_URI also is great when multiple emerge is done for same source (such as when experimenting with what flags to enable, what optimizations, which gcc version, etc). I think git fetches refetch all data on each merge?

Using SRC_URI allows the user to resume an abrupted fetch, because partial file resides in DISTDIR. I don't think the same happens for git fetches, it could in principle, but if the fetch is into PORTAGE_TMPDIR, then this is usually cleaned after unsuccesful merge.

SRC_URI = no need to emerge git. git is a dev tool, end users should not need it just to install packages.

SRC_URI works well together with tools, such as distclean, to clean disk space after package is uninstalled.

A tarball is also better due to legal woes - licensing is usually clearer, as the whole file can be easily regarded as one coherent release, while fetching a git repo is a bit more risky - some files can have a different license, but of course this happens rarely.

SRC_URI also works together with FETCHCOMMAND. People with special proxy/firewall/vpn needs can still get the source, not so for git fetches.

In general, Gentoo doesn't want cvs/svn/git-fetch sources in the tree:

https://devmanual.gentoo.org/ebuild-writing/functions/src_unpack/svn-sources/index.html

So I hope changing palemoon ebuilds to SRC_URI can facilitate their inclusion in the official Portage tree.

Here are changes needed for SRC_URI instead of git fetch:

< inherit palemoon-2 mozlinguas-palemoon git-r3 eutils flag-o-matic pax-utils

inherit palemoon-2 mozlinguas-palemoon eutils flag-o-matic pax-utils

SRC_URI="https://codeload.github.com/MoonchildProductions/Pale-Moon/tar.gz/${PV}_Release -> ${P}.tar.gz"

< EGIT_REPO_URI="https://github.com/MoonchildProductions/Pale-Moon.git" < GIT_TAG="${PV}_Release"

< git-r3_fetch ${EGIT_REPO_URI} refs/tags/${GIT_TAG} < git-r3_checkout

  unpack ${A}
  mv Pale-Moon-${PV}_Release ${P}

This website is not so good, it didn't allow me to upload a file... :-(

Bfgeshka commented 6 years ago

But git also controls verification...

Ud71p commented 6 years ago

No. It does not. Proof: https://github.com/deuiore/palemoon-overlay/blob/master/www-client/palemoon/Manifest There is no hash of the sources there.

In general, imagine you are to retrieve sources from somewhere and want to make sure you will get what you expect. If you don't have any form of hash or signature before retrieval, then it is mathematically impossible to verify that what you get is what you expect. In other words if you have no hash beforehand, then you don't know what data you expect.

You are probably right that git does some verification, but not the kind we need. It probably just verifies that what you have on your disk after the fetch is the same as what resides in the remote repo at that time. Such a check is nice to have to prevent some data traffic corruption, but not what we need from security perspective.

We need a hash verification with some hash we know before we commence the fetch.

An example of an attack to illustrate some risks: Say user A emerges palemoon with git fetch, and it's all OK. The sources retrieved are same to the ones in repo. Git verification went OK. Now another user C wants to do the same, but this user is under attack. An attacker B (a malware on palemoon's dev's machine, your-favourite-3-letter-agency, github, etc) modifies the sources in the repo before user C fetches them. Then C's fetch goes perfect. She gets exactly the same sources as are in the repo (with malware). Git verification goes OK. Now the attacker B removes the malware, so all other users get a clean non-malware sources.

The kind of verification we need is to make sure that absolutely all the users get always the same sources for a given ebuild version, and this cannot happen unless there is a hash of the source in the Manifest file.

deu commented 6 years ago

The ebuilds used to use SRC_URI with the source package being directly taken from the Pale Moon archives. This was changed because of a couple of reasons.

At first, there was no GitHub release and the official source package had broken permissions (I think it still does). After some time they started using GitHub release packages and I switched to those (See this commit).

Unfortunately they weren't consistent. I don't know if it was the Pale Moon developers' fault or GitHub's, but the checksum failed quite often and the Manifest had to be updated every time. If you glance at the commit history around that time you can see a number of "Updated/Fixed Manifest" commits, and issues were opened because of that.

No complaints were made in that regard once I switched to directly pulling the version tags from git. It still sometimes occurs when changes are made to the language packs, but that doesn't happen nearly as often.

You raise legitimate concerns, but if going back to using GitHub release packages would mean going back to inconsistent packages to redownload and check every time, then it's probably not worth it in the face of the possibility that the Pale Moon GitHub repository could be compromised. Also, I think that that eventuality could be mitigated by starting to check commit hashes.

Just a couple of things though:

SRC_URI also is great when multiple emerge is done for same source (such as when experimenting with what flags to enable, what optimizations, which gcc version, etc). I think git fetches refetch all data on each merge?

Using SRC_URI allows the user to resume an abrupted fetch, because partial file resides in DISTDIR. I don't think the same happens for git fetches, it could in principle, but if the fetch is into PORTAGE_TMPDIR, then this is usually cleaned after unsuccesful merge.

SRC_URI = no need to emerge git. git is a dev tool, end users should not need it just to install packages.

SRC_URI works well together with tools, such as distclean, to clean disk space after package is uninstalled.

I think portage should fetch into DISTDIR/git3-src by default, so no, you shouldn't have to refetch all data on each merge and resuming an abrupted fetch should work fine. And you mean emerge --depclean? Or make distclean? In any case I fail to see how using git-r3 or SRC_URI would make a difference.

All this said, I guess we could test the GitHub release packages once again and see how it goes, but if it starts causing broken Manifests all over again we'll come back to git-r3.

Oh by the way, when you want to paste code you should put it between `s (inline) or ```s (multi-line) to not have GitHub screw up the formatting.

nick87720z commented 6 years ago

Direct usage of VCS clone/fetch system, with proper eclass assistance, is good for live ebuilds (btw don't see palemoon-9999 there :) ). As for release ebuilds... they usually set SRC_URI to archived tarballs or, if hosting allowes (as github), special url, which fetches zip archive for specific commit. When submodules are used, either code owner could form custom ebuilds, or... it is still possible to fetch submodules in SRC_URI, then build complete source tree in custom src_unpack().

Files under git control are signed by definitition (by git). As for gentoo repo under any vcs control (git, hg, svn, no matter) - such repos are good to have thin-manifest flag, which causes Manifest's to sign only SRC_URI files: https://wiki.gentoo.org/wiki/Repository_format/metadata/layout.conf#thin-manifests

deu commented 6 years ago

Actually I didn't realise I could fetch a zip archive for a specific commit with GitHub. Will look into that since it seems more ideal than the current solution.

deu commented 6 years ago

Unfortunately that doesn't seem to be a solution either. From a brief research those are also inconsistent archives. This seems to be a common problem (see https://github.com/libgit2/libgit2/issues/4343 for example). Hopefully there's a solution on the horizon: https://github.com/Homebrew/homebrew-core/issues/18044#issuecomment-329301763

Having a thin Manifest really wouldn't help. From my understanding, that would only cause the files in this repository, so the ebuilds, not to be signed in the Manifest. The problems are really the files outside of it. Unless you were just making an aside suggestion unrelated to this issue.

Ud71p commented 6 years ago

Even one more thing which works only with SRC_URI and is broken by git fetch is using Tor:

https://wiki.gentoo.org/wiki/Tor#Portage

People who value privacy or want to hide from their ISP/regime/hackers what OS/packages/versions they use can only install SRC_URI and not git-fetched packages.

nick87720z commented 6 years ago

I looked for what archive links may be got by various ways. Example with version 27.6.2.

  1. In releases tabs - two entries, automatically created by github by essence: https://github.com/MoonchildProductions/Pale-Moon/archive/27.6.2_Release.tar.gz https://github.com/MoonchildProductions/Pale-Moon/archive/27.6.2_Release.zip

  2. Again from releases page - on sidebar you can see two link: tag-name, linking to tree view for commit, associated with tag. Following this link, then selecting "Clone or download"->"Download ZIP", you will get same link, as by first way. https://github.com/MoonchildProductions/Pale-Moon/archive/27.6.2_Release.zip

  3. Following to commit view, from which in turn - to tree view, and again - "Clone or download"->"Download ZIP". This time archive link is based on commit hash. https://github.com/MoonchildProductions/Pale-Moon/archive/ce6529faeb2f0c11c832a34570c79d04707c3255.zip However, replacing extention to tar.gz also works (though not obvious): https://github.com/MoonchildProductions/Pale-Moon/archive/ce6529faeb2f0c11c832a34570c79d04707c3255.tar.gz

As for as i can understand, in all these cases github doesn't prepare archives, but generates them on demand. Though for release tags... i don't know, it could prepare them as well.

As for as i can understand, these commit/tag-based urls should give same content, as git-checkout.

I don't know, how really releases page is maintained - may be they mark certain tags as releases, or there is some format for tag names.

sedimentation-fault commented 4 years ago

In my understanding, there is another problem with using Git fetches:

Suppose I realized that, for some version I installed, something did not go as expected (say, something like this here: https://github.com/deu/palemoon-overlay/issues/81). Also, suppose that this "something" has its root deeply buried into some incompatibility (or whatever else) introduced in the latest Git commits. I do see two directories with current dates in portage's DISTDIR/git3-src/:

MoonchildProductions_Pale-Moon.git
MoonchildProductions_UXP.git

I decide to revert to an older version, for which I know the problem did not occur. But since the ebuilds always fetches the current version of Pale-Moon.git and UXP.git (i.e. since the directories do not contain commit or version information in their names), I lose don't I?

deu commented 4 years ago

@sedimentation-fault No, it doesn't work like that. It doesn't matter what those directories contain at any given time. When you emerge an ebuild for a specific version, it always checkouts and builds the specific version the ebuild specifies, so you don't have to worry on that front.