brenhinkeller / StaticTools.jl

Enabling StaticCompiler.jl-based compilation of (some) Julia code to standalone native binaries by avoiding GC allocations and llvmcall-ing all the things!
MIT License
167 stars 12 forks source link

Null termination issues in StaticString #2

Closed tshort closed 1 year ago

tshort commented 2 years ago

Here are some notes:

Could the null termination be removed? I don't think Julia strings need it.

brenhinkeller commented 2 years ago

Ah, these are good questions. Fixing display would be easy enough and we should definitely do that one way or another.

The big question is whether we should try to be more consistent with including null termination or if we should try to eliminate it altogether.

In principle, dropping it would be great -- because it'll be hard to have getindex return maximally efficient views / substrings if we have to append. In practice, it may be tricky, since even LLVM is in practice expecting strings to be null-terminated (i.e., can't use puts or printf, even via direct llvmcall, without null termination).

We could maybe get around that by only ever using putc / putchar, but that'd be a nontrivial undertaking at least.

And, while Julia strings are officially "not" null-terminated, in practice they actually are every time I've checked, i.e.:

julia> str = "Hello there, I am a string"
"Hello there, I am a string"

julia> Base.unsafe_load(pointer(str), length(str)+1)
0x00

julia> substr = str[1:5]
"Hello"

julia> Base.unsafe_load(pointer(substr), length(substr)+1)
0x00
brenhinkeller commented 2 years ago

My initial reasoning for including null-termination in codeunits and in length and etc. is basically just that this is a low-level package where implementation details really matter, so my general instinct is to expose the nitty-gritty details that are often otherwise hidden -- but I could probably be convinced to go the other way on this

brenhinkeller commented 2 years ago

Also: hooray, first issue :tada:

brenhinkeller commented 2 years ago

One option may be to have getindex on ranges of a StaticString return a StaticSubString or StaticStringView view type which is explicitly not null-terminated and dispatches to methods which are aware of that

tshort commented 2 years ago

Null termination for output is a good consideration. But, we could add the null termination prior to passing to output functions.

I don't think getindex should be a view. That doesn't mesh well with other defaults in Base.

Regarding codeunits and length, I don't think we want the null termination to show up in these. They don't for normal Strings. We also want c"abc" == "abc". That's currently broken.

Also, I see that MallocString has these same issues. Maybe we could do what Base does and allocate the null termination but keep it hidden.

Semi-relevant Julia issues: https://github.com/JuliaLang/julia/pull/10994, https://github.com/JuliaLang/julia/pull/16731, https://github.com/JuliaLang/julia/issues/7036

brenhinkeller commented 2 years ago

Yeah, MallocString and StaticString have more or less identical interfaces, just different implementations of backing memory so the former can be on the heap and the latter on the stack.

WIP here: https://github.com/brenhinkeller/StaticTools.jl/tree/refactor-strings

That'll make c"abc" == "abc" which we definitely want, and make length equal to that of base strings, which we probably also want -- I do have some mixed feelings though.

brenhinkeller commented 2 years ago

So just an update on the current status of this:

julia> length(c"asdf") == length(m"asdf") == length("asdf") == 4
true

julia> c"asdf" == m"asdf" == "asdf"
true

julia> s = c"this is a longer string"
c"this is a longer string"

julia> s[9:23]
15-byte StringView:
 "a longer string"

We still differ from Base, however, in including the null termination in codeunits, ncodeunits, and sizeof.

What do people think, should we change those to hide the null termination as well?

brenhinkeller commented 1 year ago

Given no objections, I'm going to go ahead and close this as completed, but feel free to restart the discussion if it seems like we should go farther!