arachnys / cabot

Self-hosted, easily-deployable monitoring and alerts service - like a lightweight PagerDuty
MIT License
5.6k stars 592 forks source link

Fix HTTP check for websites serving non-UTF-8 content #610

Closed hartwork closed 6 years ago

hartwork commented 6 years ago

Fixes issue #607.

Response.content is of type str and needs decoding. In contrast, Response.text is of type unicode, already. This way, we no longer need to mis-assume UTF-8.

Docs: http://docs.python-requests.org/en/master/user/quickstart/#response-content

codecov[bot] commented 6 years ago

Codecov Report

Merging #610 into master will not change coverage. The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #610   +/-   ##
=======================================
  Coverage   80.99%   80.99%           
=======================================
  Files          45       45           
  Lines        2920     2920           
  Branches      177      177           
=======================================
  Hits         2365     2365           
  Misses        497      497           
  Partials       58       58
Impacted Files Coverage Δ
cabot/cabotapp/models/base.py 79.52% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a9dc61c...0b92e49. Read the comment docs.

hartwork commented 6 years ago

PS: I don't see how replacing a single line could decrease line-based coverage. Is CodeCov off here or am I missing something?

dbuxton commented 6 years ago

lgtm.

dbuxton commented 6 years ago

Re code coverage: it looks like with the changes, the branch may no longer be being explored. Is the text property of response available in the version of requests we are using? Is there some other subtle difference? It sounds like we might not want to merge this without better tests.

hartwork commented 6 years ago

An earlier version of this pull request replaced self._check_content_pattern by re.search altogether which made a test fail. The way I read that test there are unicode and str values used so both sides of the ternary conditional should be covered (and why I didn't remove the UTF-8 decoding altogether). Not?

hartwork commented 6 years ago

I can confirm that unicode decoding is covered. If I comment out line

content = content if isinstance(content, unicode) else unicode(content, "UTF-8")

tests fail (as expected). Like this:

# docker-compose run --rm web ./manage.py test cabot.cabotapp.tests.tests_basic.TestHttpStatusCheck
Starting cabot-git_redis_1 ... done
Starting cabot-git_db_1    ... done
Operations to perform:
  Apply all migrations: admin, auth, cabot_alert_email, cabot_alert_hipchat, cabot_alert_slack, cabot_alert_twilio, cabotapp, contenttypes, sessions, sites
Running migrations:
  No migrations to apply.
Found another file with the destination path 'admin/js/jquery.init.js'. It will be ignored since only the first encountered file is collected. If this is not what you want, make sure every static file has a unique path.

0 static files copied to '/code/cabot/.collectstatic', 401 unmodified.
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
F
======================================================================
FAIL: test_check_content_pattern (cabot.cabotapp.tests.tests_basic.TestHttpStatusCheck)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/code/cabot/cabotapp/tests/tests_basic.py", line 1313, in test_check_content_pattern
    self.assertFalse(item["result"])
AssertionError: True is not false

----------------------------------------------------------------------
Ran 1 test in 0.002s

FAILED (failures=1)
Destroying test database for alias 'default'...

That make sense. Are there other places where coverage could have been reduced?

dbuxton commented 6 years ago

https://codecov.io/gh/arachnys/cabot/compare/b5844aa446cc25e1b68c7d4e8a1613116819cb89...3bfb196e061c19ea256afab831f9469e44400903/diff shows that the branch is not being executed after the change, which shows that somewhere there's a logic difference between text and content. (This is weird because this branch should be explored by https://github.com/arachnys/cabot/blob/a9dc61cd39df9b7c7ac55ae13cd1a2a38c3d47df/cabot/cabotapp/tests/tests_basic.py#L456)

In fact it's because the fixtures (see fake_http_200_response) monkeypatch content but not text. So in the tests the check fails but for the wrong reason.

I've pushed a branch that fixes this, can you merge in those changes: https://github.com/arachnys/cabot/tree/add-failure-test

More generally it seems like we should be using resp.text consistently in place of resp.content (e.g. for setting CheckResult.raw_data)

hartwork commented 6 years ago

Thanks for your explanation. I was unsure about touching .raw_data previously, by now I agree that it's a good idea to switch it over, as well. Your changes are included now.