Closed Mr0grog closed 6 years ago
Looks like the page in question is woefully malformed. Here’s the beginning of the source:
<html>
<body>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<head><title>
FERC: Calendar of Events
</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><meta name="date" content="September 13, 2017 08:00:00 GMT">
...
So between that and the exception, it looks like there is simply no <head>
element for Beautiful Soup to find here. I thought I’d tested that scenario, but clearly no!
Ha! I totally did test it, but only before we made it possible to split the diff into separate insertion/deletion views: https://github.com/edgi-govdata-archiving/web-monitoring-processing/blob/d647c53957fde542a3a4fdabc3335c2b5bd19051/web_monitoring/html_diff_render.py#L207-L210
The diff raises the following exception: