Test the instance before showing it public

Kcchouette commented 4 years ago

I have the case where you listed the instance privatebin.johnnybegood.fr:

None of the “sha512” hashes in the integrity attribute match the content of the subresource. 8 privatebin.johnnybegood.fr
None of the “sha512” hashes in the integrity attribute match the content of the subresource. 2 privatebin.johnnybegood.fr
None of the “sha512” hashes in the integrity attribute match the content of the subresource. privatebin.johnnybegood.fr
None of the “sha512” hashes in the integrity attribute match the content of the subresource. privatebin.johnnybegood.fr
None of the “sha512” hashes in the integrity attribute match the content of the subresource. privatebin.johnnybegood.fr
None of the “sha512” hashes in the integrity attribute match the content of the subresource. privatebin.johnnybegood.fr
None of the “sha512” hashes in the integrity attribute match the content of the subresource. privatebin.johnnybegood.fr
ReferenceError: jQuery is not definedprivatebin.js:16:1

And using it you are in the case "Loading… In case this message never disappears please have a look at this FAQ for information to troubleshoot."

elrido commented 4 years ago

I had checked the instance upon your report earlier today and had gotten these errors as well, but at the time of this reply the issue seems to have gone away again. That host uses cloudflare and so I suspect that this was the cause of the glitch and not something that the underlying host did wrong.

A test for the SRI-hashes can of course be implemented, but we should consider the following:

In the uptime check (currently every 15min) only a HEAD request gets performed, to reduce the impact of the test on the tested servers. So such an intermittent error in secondary resources wouldn't be detected during the day.
This leaves the full check which is performed when the instance is added and once a day. Checking the SRI hashes requires that all of the resources labelled in the HTML with hashes get downloaded by the checker in full, in order to compute the sha512 sums (or whatever SRI hash algorithm is advertised). The HTML parsing would be a bit trickier, as the corresponding resource to can be in front or after the hash in the script or style tag (should we start to protect other things then just the scripts). All of them have to match or the application won't work, at least in case of the JavaScript side.
A server admin can choose to edit the JavaScript code and regenerate the hashes. They may choose a different hash algorithm that is supported by browsers. For that case we would need to support all three standardized SRI algorithms in the checker.
Side idea: We know the SRI hashes of the release versions and could therefore maintain a table of "official" hashes and in the table mark such unmodified instances. Some folks do run an instance on a development version (taken from the master branch in git) and those would not match the release hash, while otherwise unmodified. So it could be an additional checkmark for "unmodified release"?
If the SRI check fails when the instance gets added, an error message can be displayed and the instance not added to the list.
When the SRI check fails during the full check, once a day, should we mark the instance as unavailable, once (=99% instead of 100% uptime)? It will "recover" again, as the checks in between the full checks won't notice the SRI issue, but it would be a little bit lower in the list, but not removed. Or we could remove instances with failed SRI hashes immediately.

Would you have any additional suggestions or further ideas beyond these points?

Kcchouette commented 4 years ago

but at the time of this reply the issue seems to have gone away again

For me it's not

That host uses cloudflare and so I suspect that this was the cause of the glitch

You're right, paste.jacl.tech use cloudflare and has the problem! But some has not (privatebin.forgetyourname.com)

Some are indicated as online (nopaste.xyz), has cloudflare, and return "Internal Server error" :(

When the SRI check fails during the full check, once a day, should we mark the instance as unavailable, once (=99% instead of 100% uptime)? It will "recover" again, as the checks in between the full checks won't notice the SRI issue, but it would be a little bit lower in the list, but not removed. Or we could remove instances with failed SRI hashes immediately.

It can be good to have this kind of process, "if SRI check fails during full check" => put a var saying that, and when "checks in between the full checks" appears, it do not test it and put it as offline because he has the var saying "SRI check failed"
What do you think?

elrido commented 4 years ago

paste.jacl.tech seems to have that cloudflare issue with the SRI hashes, for me. That would be detected with the SRI check. What is weird though is that in the first requests the hashes are correct and then they fail in all subsequent requests and disabling the cache in the browser doesn't work around it. I'm really annoyed with cloudflare and the issues they cause for users of our software.

nopaste.xyz doesn't have an issue with SRI hashes, but regardless what I paste, there is always an error storing the paste (error 500 in the POST request to the API). This is something serverside that the server admin has to analyse in their web server logs and get fixed. This would not be detected by the SRI check and would require a different kind of check.

Basically to be really sure that it works end to end I would almost have to run a full browser stack, saving and retrieving a paste so that all possible error conditions could be detected (JS parsing errors, SRI hash mismatches, storing pastes, retrieving pastes, checking that burn-after-reading ones really get deleted, etc.) - such a thing would be far beyond the scope of this project.

"if SRI check fails during full check" => put a var saying that, and when "checks in between the full checks" appears, it do not test it and put it as offline because he has the var saying "SRI check failed"

Well, since the SRI check would only run once a day, marking the instance as offline between these daily checks would guarantee it getting deleted the next day as it will be below the 10% threshold: https://github.com/PrivateBin/Directory/blob/6e162f641a3597d25147eb41db8faabff2e98704/src/main.rs#L38-L39

Thank you for your input and pointing me at these not-quite-online cases - I'll think about how I could better detect such semi-working instances (=HTML loads ok, but app non-functional for one reason or other).

PrivateBin / Directory

Test the instance before showing it public #13