HTTPArchive / custom-metrics

Custom metrics to use with WebPageTest agents
Apache License 2.0
19 stars 22 forks source link

Security 2024 Updated security.txt Metric #125

Closed JannisBush closed 3 months ago

JannisBush commented 3 months ago

Updated custom metric for https://github.com/HTTPArchive/almanac.httparchive.org/issues/3604

Description of the changes: Update the parsing of .well-known/security.txt to take all new defined fields into account, save undefined/future/custom fields and a basic parsing of whether the file is valid (required fields exist and no field that is only allowed to occur once occurs more than once).


Test websites:

JannisBush commented 3 months ago

This feels quite verbose when so many fields are empty:

Can we only include the fields when they are present to reduce the storage and query size?

Don't know about the storage and query size of empty arrays. The keys are always the same so there might be some optimization possible.

However, I adapted the query to only keep non-empty fields. Hope that does not make the query more complex.

JannisBush commented 3 months ago
        "/.well-known/security.txt": {
            "found": false,
            "data": {
                "status": 404,
                "redirected": false,
                "url": "https://example.com/.well-known/security.txt",
                "signed": false,
                "other": [
                    [
                        "background-color",
                        "#f0f0f2;"
                    ],
                    [
                        "margin",
                        "0;"
                    ],
                    [
                        "padding",
                        "0;"
                    ],

Was not great, as the inline CSS is detected as Other directives. I now also save the content-type (MUST be text/plain according to the spec but unclear if all sites follow the spec, there are probably quite some sites that do not set any content-type header 🤔) Additionally, I only save the data is the status is of type ok (r.ok has to be true). This fixes the case of example.com which returns an HTML document with status 404, however sites that return their landing page or similar at /.well-known/security.txt with a 200 status code would still be parsed. Unsure, how to best handle such cases without introducing false negatives. Ideas:

github-actions[bot] commented 3 months ago
Custom metrics for https://almanac.httparchive.org/en/2022/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240605_HZ_7
Custom metrics for https://example.com/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240605_J4_8 Changed custom metrics values: ```json { "_well-known": { "/.well-known/assetlinks.json": { "found": false }, "/.well-known/apple-app-site-association": { "found": false }, "/.well-known/gpc.json": { "found": false }, "/robots.txt": { "found": false }, "/.well-known/security.txt": { "found": false, "data": { "status": 404, "redirected": false, "url": "https://example.com/.well-known/security.txt", "content_type": "text/html; charset=UTF-8" } }, "/.well-known/change-password": { "found": false, "data": { "status": 404, "redirected": false, "url": "https://example.com/.well-known/change-password" } }, "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": { "found": false, "data": { "status": 500, "redirected": false, "url": "https://example.com/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/" } } } } ```
Custom metrics for https://securitytxt.org/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240605_8D_9 Changed custom metrics values: ```json { "_well-known": { "/.well-known/assetlinks.json": { "found": false }, "/.well-known/apple-app-site-association": { "found": false }, "/.well-known/gpc.json": { "found": false }, "/robots.txt": { "found": true, "data": { "matched_disallows": {} } }, "/.well-known/security.txt": { "found": true, "data": { "status": 200, "redirected": false, "url": "https://securitytxt.org/.well-known/security.txt", "content_type": "text/plain; charset=utf-8", "signed": false, "contact": [ "https://hackerone.com/ed" ], "expires": [ "2025-03-14T00:00:00.000Z" ], "acknowledgments": [ "https://hackerone.com/ed/thanks" ], "preferred_languages": [ "en, fr, de" ], "canonical": [ "https://securitytxt.org/.well-known/security.txt" ], "policy": [ "https://hackerone.com/ed?type=team&view_policy=true" ], "all_required_exist": true, "only_one_requirement_broken": false, "valid": true } }, "/.well-known/change-password": { "found": false, "data": { "status": 404, "redirected": false, "url": "https://securitytxt.org/.well-known/change-password" } }, "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": { "found": false, "data": { "status": 404, "redirected": false, "url": "https://securitytxt.org/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/" } } } } ```
Custom metrics for https://facebook.com/ WPT test run results: http://webpagetest.httparchive.org/results.php?test=240605_BP_A Changed custom metrics values: ```json { "_well-known": { "/.well-known/assetlinks.json": { "found": true }, "/.well-known/apple-app-site-association": { "found": true }, "/.well-known/gpc.json": { "found": false }, "/robots.txt": { "found": true, "data": { "matched_disallows": { "Applebot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "baiduspider": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Bingbot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Discordbot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "DuckDuckBot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "facebookexternalhit": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Googlebot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Google-Extended": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Googlebot-Image": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "GPTBot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "ia_archiver": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "LinkedInBot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "msnbot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Naverbot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Pinterestbot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Screaming Frog SEO Spider": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "seznambot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Slurp": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "teoma": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "TelegramBot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Twitterbot": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Yandex": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ], "Yeti": [ "/login.php*&next=", "/login.php/?next=", "/login.php?next=", "/login/*&next=", "/login/?next=", "/login/device-based/regular/login/*&next=", "/login/device-based/regular/login/?next=", "/x/oauth/" ] } } }, "/.well-known/security.txt": { "found": true, "data": { "status": 200, "redirected": false, "url": "https://www.facebook.com/.well-known/security.txt", "content_type": "text/plain;charset=utf-8", "signed": false, "contact": [ "https://www.facebook.com/whitehat/report/" ], "expires": [ "Thu, 04 Jul 2024 23:55:25 -0700" ], "acknowledgments": [ "https://www.facebook.com/whitehat/thanks/" ], "policy": [ "https://www.facebook.com/whitehat/info/", "https://about.meta.com/security/vulnerability-disclosure-policy" ], "hiring": [ "https://www.metacareers.com/areas-of-work/security/" ], "all_required_exist": true, "only_one_requirement_broken": false, "valid": true } }, "/.well-known/change-password": { "found": true, "data": { "status": 200, "redirected": true, "url": "https://www.facebook.com/login.php?next=https%3A%2F%2Fwww.facebook.com%2F.well-known%2Fchange-password" } }, "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": { "found": false, "data": { "status": 404, "redirected": false, "url": "https://www.facebook.com/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/" } } } } ```
Custom metrics for https://slack.com WPT test run results: http://webpagetest.httparchive.org/results.php?test=240605_21_B Changed custom metrics values: ```json { "_well-known": { "/.well-known/assetlinks.json": { "found": true }, "/.well-known/apple-app-site-association": { "found": true }, "/.well-known/gpc.json": { "found": false }, "/robots.txt": { "found": true, "data": { "matched_disallows": { "*": [ "/oauth" ] } } }, "/.well-known/security.txt": { "found": true, "data": { "status": 200, "redirected": false, "url": "https://slack.com/.well-known/security.txt", "content_type": "text/plain;charset=utf-8", "signed": false, "contact": [ "https://hackerone.com/slack/" ], "policy": [ "https://hackerone.com/slack/" ], "other": [ [ "Acknowledgements", "https://hackerone.com/slack/thanks" ] ], "all_required_exist": false, "only_one_requirement_broken": false, "valid": false } }, "/.well-known/change-password": { "found": true, "data": { "status": 200, "redirected": true, "url": "https://slack.com/signin?redir=%2Faccount%2Fsettings" } }, "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": { "found": false, "data": { "status": 404, "redirected": true, "url": "https://slack.com/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200" } } } } ```
JannisBush commented 3 months ago

@tunetheweb Can this be merged before the crawl starts tomorrow?

As written above there might still be a a very small number of sites with incorrect "other" values. However, I think this does not pose a major problem: