HaxeFoundation / haxe

Haxe - The Cross-Platform Toolkit
https://haxe.org
6.16k stars 656 forks source link

[unicode] zero byte and strings #8201

Open RealyUniqueName opened 5 years ago

RealyUniqueName commented 5 years ago

haxe.io.Bytes.getString() handles zero byte as a string terminator on some targets: https://github.com/HaxeFoundation/haxe/blob/4ce4f917671c11041a728c6be073ef02a4b778df/std/js/_std/haxe/io/Bytes.hx#L143-L145 I think this is wrong because zero byte is a valid UTF8 char code.

ncannasse commented 5 years ago

I propose we push this 0 byte handling to 4.1 as I think this might require some extra work on some targets.

ncannasse commented 5 years ago

PS: it's not only about unicode, it's about 0 byte being a valid character in Strings, working with split and all APIs tested etc.

kevinresol commented 5 years ago

This is a silent breaking change. I think I do have code that relies on this zero-terminating behaviour.

I am not against breaking it. But better done at 4.0 not 4.1.

ncannasse commented 5 years ago

@kevinresol I think targets already don't agree on this, so it will not be much a breaking change, more of an actual clarification of the behavior. How do you rely on it atm exactly?

kevinresol commented 5 years ago

I am on js and use bytes.getString to extract a string from a fixed-length byte data which is zero-padded on the right end. I am aware of it and it is a simple fix. I only worked on js and I thought it is an expected behaviour of getString. I am afraid that some people may be relying on it without knowing so.

Simn commented 5 years ago

7904 is related.

kevinresol commented 5 years ago

So, assume this is fixed, then what is the correct way to extract a null-terminated string from a Bytes object?

Aurel300 commented 5 years ago

@kevinresol (new haxe.io.BytesInput(bytes, position)).readUntil(0)