WebAssembly / wasi-io

I/O Types proposal for WASI
Other
144 stars 20 forks source link

Support character-based reading and writing? #65

Closed oovm closed 8 months ago

oovm commented 11 months ago

Is it possible to add read_char and write_char to the stream? Because sometimes the external environment encoding is complicated, for example, niche code pages may be used on Windows.

I don’t want to add complex encoding and decoding functions to the bytecode, because it usually needs to embed a huge table.

sunfishcode commented 11 months ago

For read_char and write_char to work, implementations would need to know the encoding of the data they're reading from and writing to. This may be possible in some situations, such as stdio streams, but in others, such as sockets, the data stream is just bytes, and the encoding may be negotiated at the application level. It's unclear how an implementation could implement read_char or write_char in those situations.

A possible alternative would be to say that whenever an implementation does know the encoding, it should transcode the encoding to UTF-8. That would fit within the existing u8-oriented API, and allow guest code to avoid embedding all the code page tables in most cases, while not requiring implementations to do anything if they don't know the encoding. Does that sound feasible?

oovm commented 11 months ago

Selecting utf-8 as the standard encoding LGTM, helps reduce or avoid garbled code problems.

sunfishcode commented 11 months ago

I submitted https://github.com/WebAssembly/wasi-io/pull/66 to propose some text to specify this.