DigitalTrustCenter / sectxt

security.txt parser and validator
European Union Public License 1.2
17 stars 6 forks source link

Invalid responses are not handled properly #72

Closed Sandr0x00 closed 4 months ago

Sandr0x00 commented 5 months ago

Let's take the example domain https://aua24ag.de. This domain is lacking the security.txt, but forwards to /de/startseite (at least for me, since I'm german).

According to https://www.rfc-editor.org/rfc/rfc9116#name-location-of-the-securitytxt redirects are allowed, yet the tool (in my opinion) fails to determine that the response is NOT a security.txt and therefore tried to parse the HTML. This is IMHO wrong, since the RFC states:

The file MUST be accessed via HTTP 1.0 or a higher version, and the file access MUST use the "https" scheme [...]. It MUST have a Content-Type of "text/plain" with the default charset parameter set to "utf-8" [...].

These to me look like early-exit conditions where the parsing of the retrieved file makes no sense if these mandatory "MUST" conditions are not fulfilled.

Another security-txt validator written in Rust (https://github.com/eikendev/sectxt) immediately stops after the these conditions fail.

DigitalTrustCenter commented 4 months ago

Thank you for reporting this issue.

The issue is interesting, the redirect does indeed make the library want to parse the main html page since it's what it found on the given location /.well-known/security.txt. But as you stated now the path has changed. If a valid security txt file was on the main page the library would list it as correct, but it should log the error that the security.txt could not be located in the correct path.

We will check if a redirect was done on the request, if this is the case we will make sure that the path remains unchanged (so that the current url still holds the path it was checking). If this is not the case we will stop processing the response, which will log the security.txt not located error.