flathub-infra / flatpak-external-data-checker

A tool for checking if the external data used in Flatpak manifests is still up to date
GNU General Public License v2.0
115 stars 36 forks source link

Sporadic update PRs with invalid checksums for SourceForge URLs #168

Closed Zlopez closed 3 years ago

Zlopez commented 3 years ago

Hi,

today the flathubbot opened a strange PR on org.gnome.Chess. PR is opened for dependency which doesn't have x-data-checker options set up and it just changed the hash without changing the URL. I looked at the URL and the file had last update in 2016.

I will close the PR for now, because the build is failing anyway.

gasinvein commented 3 years ago

It looks much like #142, but I don't know why it pops up again. Unlike requests, aiohttp does check content-length by default.

gasinvein commented 3 years ago

There are multiple other apps hit by this at the same time, and seems like all affected sources are hosted on SourceForge. https://github.com/flathub/org.octave.Octave/pull/142 https://github.com/flathub/org.gnome.Evince/pull/47 https://github.com/flathub/org.gnome.Chess/pull/28 https://github.com/flathub/org.gnome.Books/pull/26 https://github.com/flathub/org.freedesktop.Sdk.Extension.texlive/pull/45 https://github.com/flathub/org.flightgear.FlightGear/pull/39 https://github.com/flathub/net.gwyddion.Gwyddion/pull/23 https://github.com/flathub/me.kozec.syncthingtk/pull/34 https://github.com/flathub/io.github.peazip.PeaZip/pull/5 https://github.com/flathub/im.riot.Riot/pull/175 https://github.com/flathub/com.sweethome3d.Sweethome3d/pull/22 https://github.com/flathub/com.chez.GrafX2/pull/5 And the invalid new checksum is the same one everywhere: 251a85b3bac687974f360d3796048c20ded3bf0bd69e0d1cfd1db23d013f89ed

Vanadiae commented 3 years ago

Thanks @gasinvein for mentioning the affected MRs here :)

gasinvein commented 3 years ago

Any idea what has happened? Probably some SourceForge issue? Looking at flathub download cache, it seems like this has occurred before.

SISheogorath commented 3 years ago

Source forge seems to have no permalinks to source code and/or employing some sort of blocking which results in the external data checker getting confused.

gasinvein commented 3 years ago

Given that download-sources job with the incorrect checksum has completed successfully for Element, libsoup/libbcurl didn't detect any http error either. Which means that SourceForge did actually sent invalid response without any errors (no incomplete read or anything). How can we catch this case? Is it possible at all?

SISheogorath commented 3 years ago

Iirc they send a valid page that is just HTML. At least that's what I got last time. So it's more of a figuring out how to use source forge than external a data checker issue.

gasinvein commented 3 years ago

Looks like this has happened again. https://github.com/flathub/io.github.peazip.PeaZip/pull/6 https://github.com/flathub/me.kozec.syncthingtk/pull/40 https://github.com/flathub/net.gwyddion.Gwyddion/pull/25 https://github.com/flathub/org.ardour.Ardour/pull/25 https://github.com/flathub/org.freedesktop.Sdk.Extension.texlive/pull/48 https://github.com/flathub/org.gnome.Books/pull/31 https://github.com/flathub/org.gnome.Evince/pull/50 https://github.com/flathub/org.octave.Octave/pull/156 https://github.com/flathub/org.srb2.SRB2Kart/pull/23 https://github.com/flathub/uk.co.ibboard.cawbird/pull/9

wjt commented 3 years ago

This hit me for the first time today :) https://github.com/flathub/org.tuxpaint.Tuxpaint/pull/15 – I wish we had the contents of the offending page, given that we could add a hack…

wjt commented 3 years ago

And the invalid new checksum is the same one everywhere: 251a85b3bac687974f360d3796048c20ded3bf0bd69e0d1cfd1db23d013f89ed

Oh, I missed this point before.

I know it's an awful kludge but maybe we should denylist this checksum.

Another idea:

https://github.com/flathub/flatpak-external-data-checker/blob/master/src/lib/utils.py#L116-L129

There is no check that the response code is 2xx. I wonder if the response code would turn out to be 5xx in this failure case.

gasinvein commented 3 years ago

There is no check that the response code is 2xx. I wonder if the response code would turn out to be 5xx in this failure case.

aiohttp raises an exception by default when it receives status 4xx or 5xx. We ask aiohttp to raise an exception on receiving response with error status here: https://github.com/flathub/flatpak-external-data-checker/blob/435d2d5de3d72833d7300c0b0ccf2ff2bb637b64/src/lib/externaldata.py#L362-L363 I believe we have tests covering that.

gasinvein commented 3 years ago

I know it's an awful kludge but maybe we should denylist this checksum.

I'm considering this as the last resort. Maybe we could check Content-Type header? Clearly it shouldn't be text/html for an archive. Although would be good if we could check what's the Content-Type of the erroneous sourceforge response.

gasinvein commented 3 years ago

We've captured the erroneous page contents. As of time of whiting this post, it's available in flathub download cache. The contents are:

<html><head>
<title>SourceForge</title>
<!-- <script src="/js/jquery.com/jquery-1.11.0.min.js"></script> -->
<script src="https://code.jquery.com/jquery-1.11.0.min.js"></script>
<script src="https://sourceforge.net/js/mirrors.js"></script>
<script src="/js/sf.js"></script>
<script>
var DR_loc = DR_parse_hash_url();
if (DR_loc) {
    DR_sf_main(DR_loc);
} else {
    window.location.href = 'https://sourceforge.net/home.html';
}
</script>
<meta name="description" content="Free, secure and fast downloads from the largest Open Source applications and software directory - SourceForge.net"/>
</head><body>
<noscript>
We're sorry -- the Sourceforge site is currently in Disaster Recovery mode.  Please check back later.
</noscript>
</body></html>