freedomofpress / securethenews

An automated scanner and web dashboard for tracking TLS deployment across news organizations
https://securethe.news
GNU Affero General Public License v3.0
102 stars 25 forks source link

Some apparently failed scans marked with live == true #151

Closed thisisparker closed 4 years ago

thisisparker commented 6 years ago

Some scans appear to be failing and returning a grade of zero, while stile returning the live value as true. Here's an example of one recent such scan of The Intercept:

            "grade": "F",
            "timestamp": "2018-03-02T16:15:53.857707Z",
            "live": true,
            "valid_https": false,
            "downgrades_https": false,
            "defaults_to_https": true,
            "hsts": false,
            "hsts_max_age": null,
            "hsts_entire_domain": null,
            "hsts_preload_ready": false,
            "hsts_preloaded": true,
            "score": 0

(There have been numerous examples on other sites, including The New Yorker and El Periódico, which I can dig up if it is helpful.)

This is a problem because filtering on the live value has been a pretty good way of dropping failed scans from results, which is necessary for doing data analysis and also for the Twitter bot to report changes. I haven't compiled stats on whether there has been an uptick in failed scans recently, but there has been an uptick in misfired Tweets that are the result of a successful scan following a failed scan, which points to a possible failure rate increase.

harrislapiroff commented 6 years ago

I have some fear that this might be a bug in pshtt somehow. This is my first look around this codebase, so I may be missing something.

I'm looking over the code here and it looks like it pretty straightforwardly transcribes the pshtt output to the database. Looking here, the REST API pretty straightforwardly outputs straight from the database to the response, without any particularly complex processing. I don't see anywhere, immediately, in that process for an error to get introduced.

@conorsch Can you see if you can track down some relevant logs/database rows? I want to double-check the stdout and stderr for scans that are coming up like this and any other logs that might be associated with pshtt if those exist.

harrislapiroff commented 6 years ago

Wait—scratch that. The error is that the score is 0. All the other information is correct? That would have to be in this code somewhere.

harrislapiroff commented 6 years ago

Scratch that, again. I see that the other information isn't correct either. I was deceived by "defaults_to_https": true, being correct. Okay, I'm done thinking out loud on the ticket now. :p

thisisparker commented 6 years ago

The defaults_to_https thing was why I included the Intercept. I don't at all understand why it would get that right if it just can't contact the site at all.

redshiftzero commented 6 years ago

Since a scan result can (apparently? 🤷‍♀️ ) have "valid_https" = false, but also have "defaults_to_https" = true (and even be HSTS preloaded!), then due to this if condition the score in the "valid_https" = false situation will always be 0. Perhaps changing that if condition to also check defaults_to_https is the ticket

conorsch commented 6 years ago

I have some fear that this might be a bug in pshtt somehow.

Completely reasonable. We've been surprised before (https://github.com/dhs-ncats/pshtt/issues/49) about false-versus-null in pshtt returns. Even in 0.3.0 (the version we use), there's still reason to be careful about assumptions with regard to false-versus-null: https://github.com/dhs-ncats/pshtt/issues/149

Will work on log access for you, @harrislapiroff.

chigby commented 4 years ago

Doing some looking into this

chigby commented 4 years ago

Adding some more information to this thread. Sorry if people already are aware of this, but I do think it's good to get as much into the the issue as possible.

So, to potentially explain why we're seeing these results, and also why live is potentially problematic as a heuristic for "failed scan." When pshtt inspects a domain, it checks four different endpoints: all combinations of the domain with/without "www" and with/without "https". So for example, if we're scanning domain.com, then pshtt connects to:

  1. http://domain.com
  2. http://www.domain.com
  3. https://domain.com
  4. https://www.domain.com

Then, if any of these endpoints returns a successful connection, the entire result is marked as live being True. This means if we want to flag certain scans as being okay to throw out due to transient connection problems (or some other reason) then we might also want to consider another indicator, maybe scanning multiple times or checking just the HTTPS endpoints for if they're live.

eloquence commented 4 years ago

@eloquence will do some follow-up investigation before we do additional dev work.

eloquence commented 4 years ago

@thisisparker What's your experience been over the last year with this issue? I'm not seeing any problematic tweets on the Twitter account; have you had to make a significant number of deletions, or have you already made other changes to the bot to remediate the issue?

If this is still an issue, instead of making changes to STN itself, we might want to consider making changes to the Twitter bot to increase the threshold of failed scans before it fires a tweet that degrades a news site's score.

thisisparker commented 4 years ago

I think I was able to work around this issue satisfactorily when it first popped up. I'm not 100% sure, but I think this was mostly a problem when trying to present trends in that stats blog post I put together, and not the Twitter bot. (Although I also made some changes to the Twitter bot, maybe about that time, to report only new maximums, not any positive change, which might have been in response to this; I don't remember if it was the same or a neighboring issue.) It has never been the behavior to report downgrades with the Twitter bot, so that is not a problem.

The fact that some scores throughout are erroneously zero is a problem for the integrity of the data, but we don't actively have any use cases in mind for that data and it is probably not overwhelmingly difficult to filter out failed reads if it ever comes to it. (In fact, I probably did that two years ago for the blog post! But I don't remember exactly how.)

eloquence commented 4 years ago

Thanks, Parker, for the additional detail. I spot-checked recent reports (by looking for an "F" grade in results like https://securethe.news/api/v1/sites/ap.org/scans/ ) and these failures seem to be pretty rare in practice, and may reflect partial outages during maintenance windows. If we see this problem again more frequently, or if we see it as a result of failures that are clearly on our end, it may be worth investigating further, but I would recommend that we close this issue for now.

thisisparker commented 4 years ago

No objection here. (I may live to regret these words but) even if we want to do something with a big cross section of the data in the future, I suspect filtering out the bad results will be less effort than ensuring they don't get recorded now

eloquence commented 4 years ago

Yeah -- if these reflect partial failures e.g. during a maintenance window, even immediate retries may not cause them to succeed. Per Cameron's comment above, I think the next step would be to examine the logic in pshtt to better distinguish partial failures, but that doesn't seem to be worth the effort given the relative rarity of this issue. Closing for now.