HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
613 stars 173 forks source link

Investigate 404 errors #452

Closed rviscomi closed 4 years ago

rviscomi commented 4 years ago

In the production server logs I'm seeing lots of ambiguous error messages like this:

werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
at match (/env/lib/python3.7/site-packages/werkzeug/routing.py:1799)
at match_request (/env/lib/python3.7/site-packages/flask/ctx.py:336)
at raise_routing_exception (/env/lib/python3.7/site-packages/flask/app.py:1774)
at dispatch_request (/env/lib/python3.7/site-packages/flask/app.py:1791)
at full_dispatch_request (/env/lib/python3.7/site-packages/flask/app.py:1813)

At times the server is spiking at 200 404s per minute. (This is suspiciously high)

Sometimes this happens when a site doesn't have a favicon or something innocuous, but I can't imagine why we'd be having this many 404s unless there's a broken link somewhere.

Two things:

rviscomi commented 4 years ago

This error may be related:

AttributeError: 'NoneType' object has no attribute 'get'
at render_template (/srv/main.py:17)
at page_not_found (/srv/main.py:136)
at handle_http_exception (/env/lib/python3.7/site-packages/flask/app.py:1644)
at handle_user_exception (/env/lib/python3.7/site-packages/flask/app.py:1713)
at full_dispatch_request (/env/lib/python3.7/site-packages/flask/app.py:1815)
at wsgi_app (/env/lib/python3.7/site-packages/flask/app.py:2292)

But it's similarly ambiguous.

tunetheweb commented 4 years ago

I added the favicon with https://github.com/HTTPArchive/almanac.httparchive.org/pull/438 btw.

Also see this with <link rel="apple-touch-icon" so could be that - and we probably should add anyway even if not that this time.

tunetheweb commented 4 years ago

PWA has a link to ./mobile instead of ./mobile-web but doubt that's the cause.

AymenLoukil commented 4 years ago

I run a crawl and here are the links generating errors :

https://almanac.httparchive.org/en/2019/%5D(https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS) from https://almanac.httparchive.org/en/2019/resource-hints

https://almanac.httparchive.org/static/images/2019/05_Third_Parties/fig7.png from https://almanac.httparchive.org/en/2019/third-parties

https://almanac.httparchive.org/static/images/2019/08_Security/fig1.png from https://almanac.httparchive.org/en/2019/security

https://www.ssllabs.com/ssl-pulse/) from https://almanac.httparchive.org/en/2019/security

https://almanac.httparchive.org/static/images/2019/08_Security/fig8.png from https://almanac.httparchive.org/en/2019/security

https://almanac.httparchive.org/static/images/2019/08_Security/fig3.png from https://almanac.httparchive.org/en/2019/security

https://almanac.httparchive.org/static/images/2019/08_Security/fig2.png from https://almanac.httparchive.org/en/2019/security

https://fonts.gstatic.com/ from https://almanac.httparchive.org/en/2019/fonts

https://rainy-periwinkle.glitch.me/permalink/bc8f154a95dfe06a6d0fdb099b6c8df61727b2289141a0ef16dc17b2b57d3068.html from https://almanac.httparchive.org/en/2019/markup https://rainy-periwinkle.glitch.me/permalink/3214f840b6ae3ef1074291f60fa1be4b9d9df401fe0190bfaff4bb078c8614a5.html from https://almanac.httparchive.org/en/2019/markup

Modify these links to HTTPS :

http://speedcurve.com/ from https://almanac.httparchive.org/en/2019/contributors http://paulcalvano.com/ from https://almanac.httparchive.org/en/2019/contributors http://www.filamentgroup.com/ from https://almanac.httparchive.org/en/2019/fonts

tunetheweb commented 4 years ago

Fixed all the ones I could as part of https://github.com/HTTPArchive/almanac.httparchive.org/pull/455

Remaining are:

rviscomi commented 4 years ago

Images should be fixed now.

We'll need @bkardell's help to resolve the Glitch URLs.

tunetheweb commented 4 years ago

How's it looking now @rviscomi ? Any reduction in errors? Any more detail as to what pages are missing?

rviscomi commented 4 years ago

Still seeing the errors and it doesn't look like the logging changed helped debugging. The error is lower level than our messaging.

image

tunetheweb commented 4 years ago

OK I got it.

We don't have a working 404 page - except for the routes we have defined (i.e. /static/XXX or /lang/year/XXX).

This repeats the error: http://127.0.0.1:8080/en/ for example, as does https://127.0.0.1:8080/anythingrandom - because we have no routes matching those patterns.

It shows an error page instead of the 404 page and returns a 500 to the browser, though it did start life as a 404:

ERROR:root:An error occurred during a request due to page not found: /en/
Traceback (most recent call last):
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1791, in dispatch_request
    self.raise_routing_exception(req)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1774, in raise_routing_exception
    raise request.routing_exception
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/ctx.py", line 336, in match_request
    self.url_adapter.match(return_rule=True)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/werkzeug/routing.py", line 1799, in match
    raise NotFound()
werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
INFO:werkzeug:127.0.0.1 - - [11/Nov/2019 20:04:47] "GET /en/ HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1791, in dispatch_request
    self.raise_routing_exception(req)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1774, in raise_routing_exception
    raise request.routing_exception
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/ctx.py", line 336, in match_request
    self.url_adapter.match(return_rule=True)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/werkzeug/routing.py", line 1799, in match
    raise NotFound()
werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 2309, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 2295, in wsgi_app
    response = self.handle_exception(e)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1741, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1713, in handle_user_exception
    return self.handle_http_exception(e)
  File "/Users/barry/sources/almanac.httparchive.org/src/env/lib/python3.8/site-packages/flask/app.py", line 1644, in handle_http_exception
    return handler(e)
  File "/Users/barry/almanac.httparchive.org/src/main.py", line 145, in page_not_found
    return render_template('error/404.html', error=e), 404
  File "/Users/barry/almanac.httparchive.org/src/main.py", line 18, in render_template
    year = request.view_args.get('year', DEFAULT_YEAR)
AttributeError: 'NoneType' object has no attribute 'get'

Adding a default route like this fixes it:

@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def catch_all(path):
    abort(404, 'barry was here')

And I know this fixes it as it returns our correct 404 page and gives that exact error message on it (barry was here) so I know it's making it to this route.

Other posts seem to suggest that is how this should work, and I've tested and the other routes still work (home page, chapters, methodology...etc.) as well as static pages, sitemap.xml ...etc.

Will submit a PR, though suppose I should change the 404 error message 😀

However I'm also going to add a case to handle that /en/ case and redirect to default year:

@app.route('/<lang>/')
@validate
def lang_only(lang):
    return redirect(url_for('home', lang=lang, year=DEFAULT_YEAR))
mikegeyser commented 4 years ago

Good find!

rviscomi commented 4 years ago

Here are some weird findings from the production server logs:

404: /static/images/favicon.ico/static/images/favicon.ico (Firefox) 404: /static/123 (Safari) 404: /static/images/home-hero-bg.pnghttps://almanac.httparchive.org/en/2019/ (bitlybot) 404: /static/images/apple-touch-icon.png/static/images/apple-touch-icon.png (Chrome 77)

tunetheweb commented 4 years ago

404: /static/123 (Safari)

Sure this was Safari? Think I tested that one on production from Chrome 😀

tunetheweb commented 4 years ago

BTW as we had a route for /static/ all 4 of those examples error in the same way with and without my fix.

Some people just ask for weird stuff!

rviscomi commented 4 years ago

image

rviscomi commented 4 years ago

I'm still seeing vague 404 error messages in Stackdriver:

image

However, the actual App Engine server logs are no longer showing any meaningful errors on things like broken images or bad requests, so I'm comfortable closing this issue.