Open eMaringolo opened 1 year ago
I am not a Seaside specialist, let alone on any port, but I don't think it is Seaside's job to fix encoding issues or other trouble.
If you have a String that is actually UTF-8 encoded bytes, then the problem is how you got there, it should be fixed there and then.
Obviously Seaside is capable of serving any binary file correctly (as it does in its file libraries in lots of variations), so technically anything should be possible.
I am not a Seaside specialist, let alone on any port, but I don't think it is Seaside's job to fix encoding issues or other trouble.
This is exactly the rationale behind the issue, you should be able to send a WADocument with whatever content and encoding you want and Seaside should not alter it. By default it encodes it, but if you want to send it as is you should have an option.
If you have a String that is actually UTF-8 encoded bytes, then the problem is how you got there, it should be fixed there and then.
I agree with this as well... but if WAFileLibrary compiles as a return String the extensions for which it interprets as non-binary MIME types, then you're on Seaside's hands. Compiling everything read from disk as ByteArray should have been the initial choice, but here we are...
This is exactly the rationale behind the issue, you should be able to send a WADocument with whatever content and encoding you want and Seaside should not alter it. By default it encodes it, but if you want to send it as is you should have an option.
I am pretty sure that already works, after all that is what the file library and handler already do (i.e. taking bytes, as for an image and serving them unaltered with any mime type). At least as far as I can see in Pharo / Seaside 3.
Are you sure this is not related the VAST Seaside port/implementation ? Did you try anywhere else ?
I did not try, but now I did and I noticed GRPharoPlatform>>readFileStreamOn:do:binary:
forces the input stream to be valid UTF-8, so anything other than valid UTF-8 cannot be read.
So If I want to read an ISO-8859-1 or Windows 949 (Korean) encoded file in Pharo, it doesn't work (I tried).
I guess it is because it forces a "MIME type" to be compiled as a String
(or WideString
), instead of being a ByteArray
. So in Pharo, for UTF-8 encoded files, each character in the literal compiled string will be a Character
with a valid Unicode codepoint (without any clustering), which when reencoded in the output will produce the same UTF-8 bytes.
I'll think in how to work around this, maybe the changes must be applied ONLY in the VAST adaptor layer, as I don't foresee anything changing GRPharoPlatform>>readFileStreamOn:do:binary:
to read non UTF-8 encoded files.
Thanks for the input.
using the String only as a container of the UTF-8 bytes
Sorry to be that guy but this should be avoided. If you're just sending bytes then String
is the wrong abstraction.
and it is expected to be sent back as it was uploaded, regardless of the codec used
This is very error prone. You're relying on the downloader magically getting the same encoding as the uploader.
As a work around you may try something like
WAResponse new
binary;
document: (
WAMimeDocument
on: aByteArray
mimeType: (WAMimeType fromString: 'text/csv'))
If it works it's only because we do not yet have an explicit #text
mode.
Sorry to be that guy but this should be avoided. If you're just sending bytes then
String
is the wrong abstraction.
I agree, but there were historical reasons for it, if you look at GRCodec>>encodedStringClass
you'll find it returns String
.
As a note, In the next version of VAST we changed that to be ByteArray
.
As for your workaround, I think it could work perfectly, and it's similar to our recommendation to one of our customers who reported this. Additionally, I believe it would be beneficial and non-disruptive to have an explicit option to send content 'as is' regardless of the MIME type.
I agree, but there were historical reasons for it
And we have spent years trying to slowly move away from it. It's not behaviour we want to encourage.
I believe it would be beneficial and non-disruptive to have an explicit option to send content 'as is' regardless of the MIME type.
Yeah, but not by pumping it through the image.
In some cases it is needed to send the content of a
WADocument
without encoding it, even if the MIME Type is text based (as in, not binary), or if the MIME Type specifies a character encoding which matches that of the active codec.This is needed in some cases where the content of a file library is created from a UTF-8 encoded file (e.g. a CSS file), but the content is saved as a String literal, using the String only as a container of the UTF-8 bytes. Causing a double conversion when serving that file.
Also, it might be the case that a user uploaded a text file (
text/csv
) whose contents were saved to disk using the raw contents (it is not possible to know which encoding the uploaded file has), and it is expected to be sent back as it was uploaded, regardless of the codec used (and to also avoid a possible double or mis conversion).Maybe there is a way to refactor
WAResponse>>#document:
to use the new method.In some platforms it might require that
GRCodecStream>>#nextPutAll:
checks whether the argument is aByteArray
, and do not encode it.E.g.
We tested this in VAST and works without breaking anything else, we will likely create a pull request to integrate this.
As a side note, I think that
GRCodecStream
should always produceByteArray
s as output, but I know that'd be a major change.