Open JoachimTuchel opened 1 year ago
We discussed this with @JoachimTuchel and I proposed than instead of falling back to a default codec if no charset is specified in the content type, we could fall back to the request context codec, which will likely be UTF-8.
Something like:
WARequest>>#bodyDecoded
"Comment TBD."
| contentType charSet |
contentType := self contentType.
charSet := contentType isNil ifFalse: [contentType charSet].
^ charSet isNil
ifTrue: [self requestContext codec
ifNil: [WAIllegalStateException signal: 'no character set of request body can not be determined']
ifNotNil: [:requestCodec | body isNil ifFalse: [requestCodec decode: body]]]
ifFalse: [self bodyDecodeUsing: charSet]
Getting back to this old issue/question...
First off:
bodyDecodeUsing:
returns nil
when there is no body. As such, the method bodyDecoded
will returnnil
as well.About using a default fallback then. From the spec (https://www.rfc-editor.org/rfc/rfc9110.html#field.content-type):
A sender that generates a message containing content SHOULD generate a Content-Type header field in that message unless the intended media type of the enclosed representation is unknown to the sender. If a Content-Type header field is not present, the recipient MAY either assume a media type of "application/octet-stream" ([RFC2046], Section 4.5.1) or examine the data to determine its type. So, while it is not required for a request to have the 'Content-Type' header, it may be omitted and then the spec says we may assume it's arbitrary binary data.
In the situation that no 'Content-Type' header was present, we would also default to the fallback, but I think that should not be allowed to happen. I suggest to only fallback to the encoding of the adaptor (i.e. the requestContext codec) if a 'Content-Type' header was present with a non-binary media type (text/plain, text/html,...). So, only use a default charSet in the case of non-binary mediatypes. I think we will want to error if the specified mimetype is "non-decodable" anyway, rather than silently trying to decode something that will not make sense.
What do you think @eMaringolo ?
Suggestion:
bodyDecoded
"Answer the body decoded using the character set in the request header. Answer nil if no body is present. Signal an error if no character set is present in the request header."
| contentType charSet |
contentType := self contentType.
contentType ifNil: [ WAIllegalStateException signal: 'Content type of request body cannot be determined.' ].
charSet := contentType charSet.
^ charSet
ifNil: [
self requestContext codec
ifNil: [ WAIllegalStateException signal: 'Character set of request body cannot be determined.' ]
ifNotNil: [ :requestCodec | body ifNotNil: [ requestCodec decode: body ] ] ]
ifNotNil: [ self bodyDecodeUsing: charSet ]
Just assuming a character set / encoding when one was not provided and the contents is not binary will just open a back door to introduce the kind of encoding problems we have been trying to fix for year.
I think there are 3 Problems with
WARequest>>bodyDecoded
"**_no_** character set of request body can **_not_** be determined"