Closed elarlang closed 1 month ago
we require
Content-Type
to be set, but Content-Type should match with actual content also. For example - classical problem:Content-Type: text/html
for JSON content from API is ok by this requirement, but in reality it's not ok.
Agree, we need an additional requirement for this, would you like open a PR against the master branch?
I tried to find some information but didn't find any reason - why to not ask charset also from application/json (if we ask it from application/xml)?
When @jason-invision opened the original issue, he referenced the RFC https://tools.ietf.org/html/rfc2046#section-4.1.2 Does this provide the answer you were looking for?
First, if you think about new separate requirement for response "content-type" must match with actual content, then it's better to set milestone to v4.1 and we are dealing with that when v4.0.2 is done.
Second part got solved - application/json and charset - more research, found the answer.
RFC 2046 is updated by multiple other RFC, on of them is RFC 6657 "Update to MIME regarding "charset" Parameter Handling in Textual Media Types".
(5) application -- some other kind of data, typically
either uninterpreted binary data or information to be
processed by an application. The subtype "octet-
stream" is to be used in the case of uninterpreted
binary data, in which case the simplest recommended
action is to offer to write the information into a file
for the user. The "PostScript" subtype is also defined
for the transport of PostScript material. Other
expected uses for "application" include spreadsheets,
data for mail-based scheduling systems, and languages
for "active" (computational) messaging, and word
processing formats that are not directly readable.
Note that security considerations may exist for some
types of application data, most notably
"application/PostScript" and any form of active
messaging. These issues are discussed later in this
document.
Additionally, RFC 8259 "The JavaScript Object Notation (JSON) Data Interchange Format" says clearly, that no charset for application/json
:
Note: No "charset" parameter is defined for this registration. Adding one really has no effect on compliant recipients.
PR time, good discussion above.
@tghosth
Agree, we need an additional requirement for this, would you like open a PR against the master branch?
Do we actually need separate requirement? Or we can just put something extra to current 14.4.1?
Points to cover / for discussion:
Current 14.4.1
Verify that every HTTP response contains a Content-Type header. text/*, /+xml and application/xml content types should also specify a safe character set (e.g., UTF-8, ISO-8859-1).
Problems (as reminder):
Idea (not wording) for proposal:
Verify that every HTTP response contains a content matching Content-Type header. text/*, /+xml and application/xml content types should also specify a safe character set (e.g., UTF-8, ISO-8859-1). application/octet-stream should be used as default value.
or a bit longer:
Verify that every HTTP response contains a Content-Type header. text/*, /+xml and application/xml content types should also specify a safe character set (e.g., UTF-8, ISO-8859-1). Content must match with provided Content-Type header and application/octet-stream should be used as default value.
Mozilla Developer Network "Common MIME types"
application/octet-stream is the default value for all other cases. An unknown file type should use this type. Browsers pay a particular care when manipulating these files, attempting to safeguard the user to prevent dangerous behaviors.
RFC 7231 - Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media type of the enclosed representation is unknown to the
sender. If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.
RFC 2046 - Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types
The "octet-stream" subtype is used to indicate that a body contains arbitrary binary data.
I'd prefer to leave 14.4.1 alone and add one more requirement for JSON since it's so dangerous. The dangerous content types like UTF-7 are no longer supported by the browser.
@elarlang Happy for you to extend existing requirement if you prefer.
After thinking about this more I would suggest we flat out delete 14.4.1 and add one new requirement to force JSON character types to application/json.
Also, here is a reference regarding dangerous content types no longer being supported in the browser since the beginning of the HTML5 era. https://security.stackexchange.com/questions/47489/utf-7-xss-attacks-in-modern-browsers
have modified 14.4.1 with the following:
14.4.1 | [MODIFIED] Verify that every HTTP response contains a Content-Type header. Also specify a safe character set (e.g., UTF-8, ISO-8859-1) if the content types are text/*, /+xml and application/xml. Content must match with the provided Content-Type header. | ✓ | ✓ | ✓ | 173 |
https://github.com/OWASP/ASVS/commit/9c6d8a22b28b00222e5cd1414b816823e68e4330
Some minor comments/questions:
It is really a "character encoding" not a "character set".
Or should we use "character encoding scheme" based on RFC 6365?
I had no idea what a safe was, so maybe something like "a safe character set (e.g., UTF-8, ISO-8859-1, in contrast to UTF-7)" would be nice?
As those (in theory) may change over time, I prefer not to be over-descriptive in the requirement text.
Does this conflict with text/dns (RFC4027) or text/vtt which do not define a charset parameter?
Can you give more arguments for this question to understand the reason for it without guessing it?
Does this conflict with text/dns (RFC4027) or text/vtt which do not define a charset parameter?
Can you give more arguments for this question to understand the reason for it without guessing it?
Sorry, I thought that was clear enough :smile:.
Some text/*
MIME types do not define a charset
parameter. For example:
Subject: Registration of MIME media type text/dns
MIME media type name: text
MIME subtype name: dns
Required parameters: None.
Optional parameters: None.
Is it conformant to pass a charset parameter anyway?
I don't know the answer. Also when I read the spec (https://datatracker.ietf.org/doc/html/rfc6657) and thinking about reality (https://github.com/OWASP/ASVS/issues/900#issuecomment-778346411), they may not match.
@randomstuff - any ideas how to fix this?
https://www.iana.org/assignments/media-types/text/vtt
Encoding considerations: 8bit (always UTF-8)
https://www.iana.org/assignments/media-types/text/dns
The master file format permits encoding arbitrary octet values by using the "\DDD" encoding. The use of "\DDD" encoding can be more reliable than transporting non-ASCII through MIME transports, if data passes through a gateway that re-encodes the character data.
So those mentioned ones have "their own language" or definition. Probably we have more of them and we need to make the requirement flexible/dynamic. Something like:
Verify that if a response specifies a Content-Type of "text/", "/+xml" and "/xml", it also specifies a safe character encoding (e.g., UTF-8, ISO-8859-1) with the charset parameter according to IANA Media Types.
@randomstuff - any ideas how to fix this?
I am wondering if it is necessary to enforce the usage of the charset media type parameter when other solutions are possible (eg. for HTML). Could something like this be OK?
Verify that responses which specifies a Content-Type of "text/", "/+xml" and "/xml" are either encoded in ASCII or explicitly specify a content-type when this is required by the associated content-type definition.
I am wondering if it is necessary to enforce the usage of the charset media type parameter when other solutions are possible (eg. for HTML). Could something like this be OK?
The specification supports your concerns.
The specification, released in 2012, says (https://datatracker.ietf.org/doc/html/rfc6657):
Many complex text subtypes such as "text/html" [RFC2854] and "text/ xml" [RFC3023] have internal (to their format) means of describing the charset. Many existing User Agents ignore the default of "US- ASCII" rule for at least "text/html" and "text/xml".
My comment in https://github.com/OWASP/ASVS/issues/788#issuecomment-2333458487
I don't know the answer. Also when I read the spec (https://datatracker.ietf.org/doc/html/rfc6657) and thinking about reality (#900 (comment)), they may not match.
The reason behind that was my test in 2021: https://github.com/OWASP/ASVS/issues/900#issuecomment-778346411
As web servers often set their defaults for content-type, and the default often is iso-88591-1, then it causes some mess when displaying content in utf-8 (for example).
So here I would say, that the content-type header is required, it must specify the charset and it must match with the served content - it means also the charset definition inside the HTML or XML document.
Note that the requirement is moved to 13.1.8.
@elarlang does this need to be retagged as V13?
ping @randomstuff
If you read my last comment, do you think we should move on with my proposal (https://github.com/OWASP/ASVS/issues/788#issuecomment-2345310163) or with your proposal (https://github.com/OWASP/ASVS/issues/788#issuecomment-2345427933)?
@elarlang, I like my version better because in practice I did not think the lack charset=
parameter in Content-Type
is (or should be) an issue in practice if the encoding is already specified in the file (HTML <meta charset='...'>
) (?) but maybe I wrong on this.
Did you take a look at this: https://github.com/OWASP/ASVS/issues/900#issuecomment-778346411?
Did you take a look at this: https://github.com/OWASP/ASVS/issues/900#issuecomment-778346411?
@elarlang, I did not look at this carefully enough but I understand that
Content-Type header overrides charset from HTML document
which does not really contradict my claim:
the lack of charset= parameter in Content-Type is (or should be) an issue in practice if the encoding is already specified in the file
But then for example, portswigger agrees with you (at least for HTML):
For every response containing HTML content, the application should include within the Content-type header a directive specifying a standard recognized character set, for example charset=ISO-8859-1.
So let's settle for you version?
Should we need to explicitly state that the intent is to avoid character encoding confusion vulnerabilities? Otherwise, this requirement could be easily overlooked.
It is kind of re-opening the issue #1459, but I keep the conversation here.
At the moment we have:
# | Description | L1 | L2 | L3 | CWE |
---|---|---|---|---|---|
13.1.7 | [MODIFIED, MOVED FROM 14.4.1, SPLIT TO 13.1.8] Verify that every HTTP response contains a Content-Type header which matches the actual content of the response. | ✓ | ✓ | ✓ | 173 |
13.1.8 | [ADDED, SPLIT FROM 13.1.7] Verify that if a response specifies a Content-Type of "text/*", "*/*+xml" and "*/xml", it also specifies a safe character set (e.g., UTF-8, ISO-8859-1) with the charset parameter. | ✓ | ✓ | ✓ | 173 |
Previously those were in one requirement and got split up via https://github.com/OWASP/ASVS/issues/1459#issuecomment-1359315226 although I preferred to keep it as one requirement.
Should we need to explicitly state that the intent is to avoid character encoding confusion vulnerabilities? Otherwise, this requirement could be easily overlooked.
I'm not sure how realistic the "character encoding confusion" problem is nowadays. For me it's more for integrity - that the data is displayed correctly to the output. From that perspective, the charset is just part of the content-type header which means that the header and content must match. For example, the charset in the content-type header and the charset in the HTML document must match. In new requirements (for example in the OAuth section) we try to avoid creating a testing guide, I think the previous split should be reviewed or made the goal for the requirement clearly understandable.
As previous split was made by @tghosth , I assign the issue to him.
One proposal for direction:
Verify that every HTTP response contains a Content-Type header that matches the actual content of the response, including the charset parameter to specify safe character encoding (e.g., UTF-8, ISO-8859-1) according to IANA Media Types, such as "text/", "/+xml" and "/xml".
One note, a content type header is not necessary if there content length is zero. This is one of the mod security core rule set rules as well. I just saw this debate through this cheat sheet series.
Update based on Jim's note:
Verify that every HTTP response having the message body contains a Content-Type header field that matches the actual content of the response, including the charset parameter to specify safe character encoding (e.g., UTF-8, ISO-8859-1) according to IANA Media Types, such as "text/", "/+xml" and "/xml".
(I think the second part of the requirement needs some better wording)
Minor cleanup:
Verify that every HTTP response having a message body contains a Content-Type header field that matches the actual content of the response, including the charset parameter to specify safe character encoding (e.g., UTF-8, ISO-8859-1) according to IANA Media Types, such as "text/", "/+xml" and "/xml".
I am ok with this. I prefer them separate as I think it makes it easier but I can live with them together.
I made a minor change to the proposal:
Verify that every HTTP response with a message body contains a Content-Type header field that matches the actual content of the response, including the charset parameter to specify safe character encoding (e.g., UTF-8, ISO-8859-1) according to IANA Media Types, such as "text/", "/+xml" and "/xml".
@elarlang you want to PR it in?
ping @randomstuff - are your concerns covered with the latest proposal?
OK for me.
Requirement: V14.4.1
From #710 comments, by @tghosth:
Actually I have 2 more questions/ideas/problems with this requirement:
Content-Type
to be set, but Content-Type should match with actual content also. For example - classical problem:Content-Type: text/html
for JSON content from API is ok by this requirement, but in reality it's not ok... or we need to address those problems with separate requirement in subcategory V13.2 RESTful Web Service Verification Requirements