V13.1.8 - Content-Type header should match with actual response content

elarlang commented 4 years ago

Requirement: V14.4.1

V14.4.1 Verify that every HTTP response contains a Content-Type header. text/*, /+xml and application/xml content types should also specify a safe character set (e.g., UTF-8, ISO-8859-1).

From #710 comments, by @tghosth:

@elarlang please open a separate issues for your most recent comments.

Actually I have 2 more questions/ideas/problems with this requirement:

we require Content-Type to be set, but Content-Type should match with actual content also. For example - classical problem: Content-Type: text/html for JSON content from API is ok by this requirement, but in reality it's not ok.
I tried to find some information but didn't find any reason - why to not ask charset also from application/json (if we ask it from application/xml)?
- one problem with JSON should be in this requirement, that JSON content should not be with ISO 8859-1 charset

.. or we need to address those problems with separate requirement in subcategory V13.2 RESTful Web Service Verification Requirements

tghosth commented 4 years ago

we require Content-Type to be set, but Content-Type should match with actual content also. For example - classical problem: Content-Type: text/html for JSON content from API is ok by this requirement, but in reality it's not ok.

Agree, we need an additional requirement for this, would you like open a PR against the master branch?

I tried to find some information but didn't find any reason - why to not ask charset also from application/json (if we ask it from application/xml)?

When @jason-invision opened the original issue, he referenced the RFC https://tools.ietf.org/html/rfc2046#section-4.1.2 Does this provide the answer you were looking for?

elarlang commented 4 years ago

First, if you think about new separate requirement for response "content-type" must match with actual content, then it's better to set milestone to v4.1 and we are dealing with that when v4.0.2 is done.

Second part got solved - application/json and charset - more research, found the answer.

RFC 2046 is updated by multiple other RFC, on of them is RFC 6657 "Update to MIME regarding "charset" Parameter Handling in Textual Media Types".

    (5)   application -- some other kind of data, typically
          either uninterpreted binary data or information to be
          processed by an application.  The subtype "octet-
          stream" is to be used in the case of uninterpreted
          binary data, in which case the simplest recommended
          action is to offer to write the information into a file
          for the user.  The "PostScript" subtype is also defined
          for the transport of PostScript material.  Other
          expected uses for "application" include spreadsheets,
          data for mail-based scheduling systems, and languages
          for "active" (computational) messaging, and word
          processing formats that are not directly readable.
          Note that security considerations may exist for some
          types of application data, most notably
          "application/PostScript" and any form of active
          messaging.  These issues are discussed later in this
          document.

Additionally, RFC 8259 "The JavaScript Object Notation (JSON) Data Interchange Format" says clearly, that no charset for application/json:

Note: No "charset" parameter is defined for this registration. Adding one really has no effect on compliant recipients.

jmanico commented 3 years ago

PR time, good discussion above.

elarlang commented 3 years ago

@tghosth

Agree, we need an additional requirement for this, would you like open a PR against the master branch?

Do we actually need separate requirement? Or we can just put something extra to current 14.4.1?

Points to cover / for discussion:

Verify that Content-Type header value is matching with served content
If presented content is unknown type, use "application/octet-stream" as content-type value

elarlang commented 3 years ago

Current 14.4.1

Verify that every HTTP response contains a Content-Type header. text/*, /+xml and application/xml content types should also specify a safe character set (e.g., UTF-8, ISO-8859-1).

Problems (as reminder):

content must match with Content-Type header
some default value for unknown content must be used

Idea (not wording) for proposal:

Verify that every HTTP response contains a content matching Content-Type header. text/*, /+xml and application/xml content types should also specify a safe character set (e.g., UTF-8, ISO-8859-1). application/octet-stream should be used as default value.

or a bit longer:

Verify that every HTTP response contains a Content-Type header. text/*, /+xml and application/xml content types should also specify a safe character set (e.g., UTF-8, ISO-8859-1). Content must match with provided Content-Type header and application/octet-stream should be used as default value.

Mozilla Developer Network "Common MIME types"

application/octet-stream is the default value for all other cases. An unknown file type should use this type. Browsers pay a particular care when manipulating these files, attempting to safeguard the user to prevent dangerous behaviors.

RFC 7231 - Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content

A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media type of the enclosed representation is unknown to the
sender.  If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.

RFC 2046 - Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types

The "octet-stream" subtype is used to indicate that a body contains arbitrary binary data.

jmanico commented 3 years ago

I'd prefer to leave 14.4.1 alone and add one more requirement for JSON since it's so dangerous. The dangerous content types like UTF-7 are no longer supported by the browser.

tghosth commented 3 years ago

@elarlang Happy for you to extend existing requirement if you prefer.

jmanico commented 3 years ago

After thinking about this more I would suggest we flat out delete 14.4.1 and add one new requirement to force JSON character types to application/json.

Also, here is a reference regarding dangerous content types no longer being supported in the browser since the beginning of the HTML5 era. https://security.stackexchange.com/questions/47489/utf-7-xss-attacks-in-modern-browsers

danielcuthbert commented 3 years ago

have modified 14.4.1 with the following:

14.4.1 | [MODIFIED] Verify that every HTTP response contains a Content-Type header. Also specify a safe character set (e.g., UTF-8, ISO-8859-1) if the content types are text/*, /+xml and application/xml. Content must match with the provided Content-Type header. | ✓ | ✓ | ✓ | 173 |

https://github.com/OWASP/ASVS/commit/9c6d8a22b28b00222e5cd1414b816823e68e4330

randomstuff commented 2 months ago

Some minor comments/questions:

It is really a "character encoding" not a "character set".
I had no idea what a safe was, so maybe something like "a safe character set (e.g., UTF-8, ISO-8859-1, in contrast to UTF-7)" would be nice?
Does this conflict with text/dns (RFC4027) or text/vtt which do not define a charset parameter?

elarlang commented 2 months ago

It is really a "character encoding" not a "character set".

Or should we use "character encoding scheme" based on RFC 6365?

I had no idea what a safe was, so maybe something like "a safe character set (e.g., UTF-8, ISO-8859-1, in contrast to UTF-7)" would be nice?

As those (in theory) may change over time, I prefer not to be over-descriptive in the requirement text.

Does this conflict with text/dns (RFC4027) or text/vtt which do not define a charset parameter?

Can you give more arguments for this question to understand the reason for it without guessing it?

randomstuff commented 2 months ago

Does this conflict with text/dns (RFC4027) or text/vtt which do not define a charset parameter?

Can you give more arguments for this question to understand the reason for it without guessing it?

Sorry, I thought that was clear enough :smile:.

Some text/* MIME types do not define a charset parameter. For example:

Subject: Registration of MIME media type text/dns

MIME media type name: text

MIME subtype name: dns

Required parameters: None.

Optional parameters: None.

Is it conformant to pass a charset parameter anyway?

elarlang commented 2 months ago

I don't know the answer. Also when I read the spec (https://datatracker.ietf.org/doc/html/rfc6657) and thinking about reality (https://github.com/OWASP/ASVS/issues/900#issuecomment-778346411), they may not match.

elarlang commented 2 months ago

@randomstuff - any ideas how to fix this?

https://www.iana.org/assignments/media-types/text/vtt

Encoding considerations: 8bit (always UTF-8)

https://www.iana.org/assignments/media-types/text/dns

The master file format permits encoding arbitrary octet values by using the "\DDD" encoding. The use of "\DDD" encoding can be more reliable than transporting non-ASCII through MIME transports, if data passes through a gateway that re-encodes the character data.

So those mentioned ones have "their own language" or definition. Probably we have more of them and we need to make the requirement flexible/dynamic. Something like:

Verify that if a response specifies a Content-Type of "text/", "/+xml" and "/xml", it also specifies a safe character encoding (e.g., UTF-8, ISO-8859-1) with the charset parameter according to IANA Media Types.

randomstuff commented 2 months ago

@randomstuff - any ideas how to fix this?

I am wondering if it is necessary to enforce the usage of the charset media type parameter when other solutions are possible (eg. for HTML). Could something like this be OK?

Verify that responses which specifies a Content-Type of "text/", "/+xml" and "/xml" are either encoded in ASCII or explicitly specify a content-type when this is required by the associated content-type definition.

elarlang commented 1 month ago

I am wondering if it is necessary to enforce the usage of the charset media type parameter when other solutions are possible (eg. for HTML). Could something like this be OK?

The specification supports your concerns.

The specification, released in 2012, says (https://datatracker.ietf.org/doc/html/rfc6657):

Many complex text subtypes such as "text/html" [RFC2854] and "text/ xml" [RFC3023] have internal (to their format) means of describing the charset. Many existing User Agents ignore the default of "US- ASCII" rule for at least "text/html" and "text/xml".

My comment in https://github.com/OWASP/ASVS/issues/788#issuecomment-2333458487

I don't know the answer. Also when I read the spec (https://datatracker.ietf.org/doc/html/rfc6657) and thinking about reality (#900 (comment)), they may not match.

The reason behind that was my test in 2021: https://github.com/OWASP/ASVS/issues/900#issuecomment-778346411

As web servers often set their defaults for content-type, and the default often is iso-88591-1, then it causes some mess when displaying content in utf-8 (for example).

So here I would say, that the content-type header is required, it must specify the charset and it must match with the served content - it means also the charset definition inside the HTML or XML document.

Note that the requirement is moved to 13.1.8.

tghosth commented 1 month ago

@elarlang does this need to be retagged as V13?

elarlang commented 1 month ago

ping @randomstuff

If you read my last comment, do you think we should move on with my proposal (https://github.com/OWASP/ASVS/issues/788#issuecomment-2345310163) or with your proposal (https://github.com/OWASP/ASVS/issues/788#issuecomment-2345427933)?

randomstuff commented 1 month ago

@elarlang, I like my version better because in practice I did not think the lack charset= parameter in Content-Type is (or should be) an issue in practice if the encoding is already specified in the file (HTML <meta charset='...'>) (?) but maybe I wrong on this.

elarlang commented 1 month ago

Did you take a look at this: https://github.com/OWASP/ASVS/issues/900#issuecomment-778346411?

randomstuff commented 1 month ago

Did you take a look at this: https://github.com/OWASP/ASVS/issues/900#issuecomment-778346411?

@elarlang, I did not look at this carefully enough but I understand that

Content-Type header overrides charset from HTML document

which does not really contradict my claim:

the lack of charset= parameter in Content-Type is (or should be) an issue in practice if the encoding is already specified in the file

But then for example, portswigger agrees with you (at least for HTML):

For every response containing HTML content, the application should include within the Content-type header a directive specifying a standard recognized character set, for example charset=ISO-8859-1.

So let's settle for you version?

randomstuff commented 1 month ago

Should we need to explicitly state that the intent is to avoid character encoding confusion vulnerabilities? Otherwise, this requirement could be easily overlooked.

elarlang commented 1 month ago

It is kind of re-opening the issue #1459, but I keep the conversation here.

At the moment we have:

#	Description	L1	L2	L3	CWE
13.1.7	[MODIFIED, MOVED FROM 14.4.1, SPLIT TO 13.1.8] Verify that every HTTP response contains a Content-Type header which matches the actual content of the response.	✓	✓	✓	173
13.1.8	[ADDED, SPLIT FROM 13.1.7] Verify that if a response specifies a Content-Type of "text/", "/+xml" and "/xml", it also specifies a safe character set (e.g., UTF-8, ISO-8859-1) with the charset parameter.	✓	✓	✓	173

Previously those were in one requirement and got split up via https://github.com/OWASP/ASVS/issues/1459#issuecomment-1359315226 although I preferred to keep it as one requirement.

Should we need to explicitly state that the intent is to avoid character encoding confusion vulnerabilities? Otherwise, this requirement could be easily overlooked.

I'm not sure how realistic the "character encoding confusion" problem is nowadays. For me it's more for integrity - that the data is displayed correctly to the output. From that perspective, the charset is just part of the content-type header which means that the header and content must match. For example, the charset in the content-type header and the charset in the HTML document must match. In new requirements (for example in the OAuth section) we try to avoid creating a testing guide, I think the previous split should be reviewed or made the goal for the requirement clearly understandable.

As previous split was made by @tghosth , I assign the issue to him.

elarlang commented 1 month ago

One proposal for direction:

Verify that every HTTP response contains a Content-Type header that matches the actual content of the response, including the charset parameter to specify safe character encoding (e.g., UTF-8, ISO-8859-1) according to IANA Media Types, such as "text/", "/+xml" and "/xml".

jmanico commented 1 month ago

One note, a content type header is not necessary if there content length is zero. This is one of the mod security core rule set rules as well. I just saw this debate through this cheat sheet series.

elarlang commented 1 month ago

Update based on Jim's note:

Verify that every HTTP response having the message body contains a Content-Type header field that matches the actual content of the response, including the charset parameter to specify safe character encoding (e.g., UTF-8, ISO-8859-1) according to IANA Media Types, such as "text/", "/+xml" and "/xml".

(I think the second part of the requirement needs some better wording)

jmanico commented 1 month ago

Minor cleanup:

Verify that every HTTP response having a message body contains a Content-Type header field that matches the actual content of the response, including the charset parameter to specify safe character encoding (e.g., UTF-8, ISO-8859-1) according to IANA Media Types, such as "text/", "/+xml" and "/xml".

tghosth commented 1 month ago

I am ok with this. I prefer them separate as I think it makes it easier but I can live with them together.

I made a minor change to the proposal:

Verify that every HTTP response with a message body contains a Content-Type header field that matches the actual content of the response, including the charset parameter to specify safe character encoding (e.g., UTF-8, ISO-8859-1) according to IANA Media Types, such as "text/", "/+xml" and "/xml".

@elarlang you want to PR it in?

elarlang commented 1 month ago

ping @randomstuff - are your concerns covered with the latest proposal?

randomstuff commented 1 month ago

OK for me.

OWASP / ASVS

V13.1.8 - Content-Type header should match with actual response content #788