Closed thisisparker closed 4 years ago
I have some fear that this might be a bug in pshtt
somehow. This is my first look around this codebase, so I may be missing something.
I'm looking over the code here and it looks like it pretty straightforwardly transcribes the pshtt
output to the database. Looking here, the REST API pretty straightforwardly outputs straight from the database to the response, without any particularly complex processing. I don't see anywhere, immediately, in that process for an error to get introduced.
@conorsch Can you see if you can track down some relevant logs/database rows? I want to double-check the stdout
and stderr
for scans that are coming up like this and any other logs that might be associated with pshtt
if those exist.
Wait—scratch that. The error is that the score
is 0
. All the other information is correct? That would have to be in this code somewhere.
Scratch that, again. I see that the other information isn't correct either. I was deceived by "defaults_to_https": true,
being correct. Okay, I'm done thinking out loud on the ticket now. :p
The defaults_to_https
thing was why I included the Intercept. I don't at all understand why it would get that right if it just can't contact the site at all.
Since a scan result can (apparently? 🤷♀️ ) have "valid_https" = false
, but also have "defaults_to_https" = true
(and even be HSTS preloaded!), then due to this if condition the score in the "valid_https" = false
situation will always be 0. Perhaps changing that if condition to also check defaults_to_https
is the ticket
I have some fear that this might be a bug in pshtt somehow.
Completely reasonable. We've been surprised before (https://github.com/dhs-ncats/pshtt/issues/49) about false-versus-null in pshtt returns. Even in 0.3.0 (the version we use), there's still reason to be careful about assumptions with regard to false-versus-null: https://github.com/dhs-ncats/pshtt/issues/149
Will work on log access for you, @harrislapiroff.
Doing some looking into this
Adding some more information to this thread. Sorry if people already are aware of this, but I do think it's good to get as much into the the issue as possible.
So, to potentially explain why we're seeing these results, and also why live
is potentially problematic as a heuristic for "failed scan." When pshtt
inspects a domain, it checks four different endpoints: all combinations of the domain with/without "www" and with/without "https". So for example, if we're scanning domain.com
, then pshtt
connects to:
http://domain.com
http://www.domain.com
https://domain.com
https://www.domain.com
Then, if any of these endpoints returns a successful connection, the entire result is marked as live
being True
. This means if we want to flag certain scans as being okay to throw out due to transient connection problems (or some other reason) then we might also want to consider another indicator, maybe scanning multiple times or checking just the HTTPS endpoints for if they're live.
@eloquence will do some follow-up investigation before we do additional dev work.
@thisisparker What's your experience been over the last year with this issue? I'm not seeing any problematic tweets on the Twitter account; have you had to make a significant number of deletions, or have you already made other changes to the bot to remediate the issue?
If this is still an issue, instead of making changes to STN itself, we might want to consider making changes to the Twitter bot to increase the threshold of failed scans before it fires a tweet that degrades a news site's score.
I think I was able to work around this issue satisfactorily when it first popped up. I'm not 100% sure, but I think this was mostly a problem when trying to present trends in that stats blog post I put together, and not the Twitter bot. (Although I also made some changes to the Twitter bot, maybe about that time, to report only new maximums, not any positive change, which might have been in response to this; I don't remember if it was the same or a neighboring issue.) It has never been the behavior to report downgrades with the Twitter bot, so that is not a problem.
The fact that some scores throughout are erroneously zero is a problem for the integrity of the data, but we don't actively have any use cases in mind for that data and it is probably not overwhelmingly difficult to filter out failed reads if it ever comes to it. (In fact, I probably did that two years ago for the blog post! But I don't remember exactly how.)
Thanks, Parker, for the additional detail. I spot-checked recent reports (by looking for an "F" grade in results like https://securethe.news/api/v1/sites/ap.org/scans/ ) and these failures seem to be pretty rare in practice, and may reflect partial outages during maintenance windows. If we see this problem again more frequently, or if we see it as a result of failures that are clearly on our end, it may be worth investigating further, but I would recommend that we close this issue for now.
No objection here. (I may live to regret these words but) even if we want to do something with a big cross section of the data in the future, I suspect filtering out the bad results will be less effort than ensuring they don't get recorded now
Yeah -- if these reflect partial failures e.g. during a maintenance window, even immediate retries may not cause them to succeed. Per Cameron's comment above, I think the next step would be to examine the logic in pshtt
to better distinguish partial failures, but that doesn't seem to be worth the effort given the relative rarity of this issue. Closing for now.
Some scans appear to be failing and returning a grade of zero, while stile returning the
live
value astrue
. Here's an example of one recent such scan ofThe Intercept
:(There have been numerous examples on other sites, including
The New Yorker
andEl Periódico
, which I can dig up if it is helpful.)This is a problem because filtering on the
live
value has been a pretty good way of dropping failed scans from results, which is necessary for doing data analysis and also for the Twitter bot to report changes. I haven't compiled stats on whether there has been an uptick in failed scans recently, but there has been an uptick in misfired Tweets that are the result of a successful scan following a failed scan, which points to a possible failure rate increase.