SeasideSt / Seaside

The framework for developing sophisticated web applications in Smalltalk.
MIT License
519 stars 71 forks source link

3 Problems in WARequest>>#bodyDecoded #1357

Open JoachimTuchel opened 1 year ago

JoachimTuchel commented 1 year ago

I think there are 3 Problems with WARequest>>bodyDecoded

  1. The comment says the method returns nil if no body is present. That's not true, it doesn't check for a body at all, and that may even be unnecessary
  2. I think it is reasonable to assume the body is encoded in utf-8 if no other charset ist stated the contentType parameters. In my case the contentType id text/plain without any parameters. I don't think this should throw an error, but assume it is utf-8
  3. If the throwing of an error is intended, either the word "no" or "not" should be removed from the error message, because it is now doubly negated: "**_no_** character set of request body can **_not_** be determined"
eMaringolo commented 1 year ago

We discussed this with @JoachimTuchel and I proposed than instead of falling back to a default codec if no charset is specified in the content type, we could fall back to the request context codec, which will likely be UTF-8.

Something like:

WARequest>>#bodyDecoded
  "Comment TBD."

  | contentType charSet |
  contentType := self contentType.
  charSet := contentType isNil ifFalse: [contentType charSet].
  ^ charSet isNil
      ifTrue: [self requestContext codec
        ifNil: [WAIllegalStateException signal: 'no character set of request body can not be determined']
        ifNotNil: [:requestCodec | body isNil ifFalse: [requestCodec decode: body]]]
      ifFalse: [self bodyDecodeUsing: charSet]
jbrichau commented 4 months ago

Getting back to this old issue/question...

First off:

About using a default fallback then. From the spec (https://www.rfc-editor.org/rfc/rfc9110.html#field.content-type):

A sender that generates a message containing content SHOULD generate a Content-Type header field in that message unless the intended media type of the enclosed representation is unknown to the sender. If a Content-Type header field is not present, the recipient MAY either assume a media type of "application/octet-stream" ([RFC2046], Section 4.5.1) or examine the data to determine its type. So, while it is not required for a request to have the 'Content-Type' header, it may be omitted and then the spec says we may assume it's arbitrary binary data.

In the situation that no 'Content-Type' header was present, we would also default to the fallback, but I think that should not be allowed to happen. I suggest to only fallback to the encoding of the adaptor (i.e. the requestContext codec) if a 'Content-Type' header was present with a non-binary media type (text/plain, text/html,...). So, only use a default charSet in the case of non-binary mediatypes. I think we will want to error if the specified mimetype is "non-decodable" anyway, rather than silently trying to decode something that will not make sense.

What do you think @eMaringolo ?

jbrichau commented 4 months ago

Suggestion:

bodyDecoded
    "Answer the body decoded using the character set in the request header. Answer nil if no body is present. Signal an error if no character set is present in the request header."

    | contentType charSet |
    contentType := self contentType.
    contentType ifNil: [ WAIllegalStateException signal: 'Content type of request body cannot be determined.' ].
    charSet := contentType charSet.
    ^ charSet 
        ifNil: [ 
            self requestContext codec
                ifNil: [ WAIllegalStateException signal: 'Character set of request body cannot be determined.' ]
                ifNotNil: [ :requestCodec | body ifNotNil: [ requestCodec decode: body ] ] ]
        ifNotNil: [ self bodyDecodeUsing: charSet ]
marschall commented 4 months ago

Just assuming a character set / encoding when one was not provided and the contents is not binary will just open a back door to introduce the kind of encoding problems we have been trying to fix for year.