Closed mkitti closed 2 years ago
Hmmmm, I think I'd prefer lowercase? inline"this is a string"31
? Thoughts @nickrobinson251/@bkamins?
I do not think these test failures are related to this change.
Thank you for contributing! Glad to have another person interested in the package 😄
I'm just pretty much against macros where avoidable so I should probably defer to others on whether we add this or not (i'd tentatively be against, but it's not much code to maintain and perhaps some users like macros so 🤷)
What's the benefit of the string macro in this case?
I tend to think of macros as mostly providing nicer syntax.
e.g. r"\w+"
at least saves letters over Regex("\w+")
, likewise tz"UTC"
over TimeZone("UTC")
.
But we already have e.g. String7("foobar")
which is shorter (and seems much clearer / less magic to me) than inline"foobar"7
. I suppose in the case where we pick the size for you, inline"x"
is a bit shorter than InlineString("x")
...
If we do add this, i agree with Jacob on preferring the lowercase spelling :)
I prefer lowercase spelling.
However, the major question is in what context do you expect to use these macros in practice. My experience is that these strings are useful if one works with collections of them (usually large collections). In such cases the key thing for performance is having uniform type of strings in the whole collection (I raise this point as this is not guaranteed by this macro - and cannot be).
In summary - could you give some practical example when you think that having such macro would be handy?
I tend to think of macros as mostly providing nicer syntax.
The other benefit of macros is that you can do work at macro-expansion-time instead of at runtime, which is another important benefit of the Regex example you provided. i.e. we compile the regex expression at macro-expansion-time once instead of every time we encounter the regex at runtime.
In a similar vein, but as far as I can tell not implemented as such here, you could do the InlineString construction at macro-expansion-time instead of doing it at runtime. To do that, we'd want to do a full InlineString(str)
and then inject the result of that into the quote
block. But, you have to be careful with interpolated strings and such because those rely on runtime parameters and such.
I updated the macros so they are now lowercase and more is done while building the expression.
@mkitti - my question about use-case was important (at least for me). Whether the design you propose is OK will depend on the intended usage scenarios. What are the situations where you assume you would use these macros?
What are the situations where you assume you would use these macros?
The way I view this is a way to create "inline string" literals that are interpreted as raw strings. Currently, one has to first construct a String
and then pass it to an InlineStrings constructor. These macros allow for the direct construction of inline strings without String
intermediates.
julia> using InlineStrings
julia> A = String15[]
String15[]
julia> push!(A, inline"Hello world"15)
1-element Vector{String15}:
"Hello world"
julia> push!(A, inline"Bye"15)
2-element Vector{String15}:
"Hello world"
"Bye"
julia> push!(A, inline"Hasta Luego"15)
3-element Vector{String15}:
"Hello world"
"Bye"
"Hasta Luego"
One practical benefit of this is avoiding unnecessary allocation during construction.
julia> @allocated inline"This is a string macro"
0
julia> @allocated InlineString("This is not a string macro")
48
Consider how the two forms differ in compilation:
julia> f() = inline"This is a string. Hello!"
f (generic function with 1 method)
julia> g() = String31("This is a string. Hello!")
g (generic function with 1 method)
julia> @code_llvm f()
; @ REPL[30]:1 within `f`
; Function Attrs: uwtable
define void @julia_f_731(i256* noalias nocapture sret(i256) %0) #0 {
top:
store i256 38178759162904981154304547648621834361872905465358515510743786238036441825304, i256* %0, align 8
ret void
}
julia> @code_llvm g()
; @ REPL[31]:1 within `g`
; Function Attrs: uwtable
define void @julia_g_733(i256* noalias nocapture sret(i256) %0) #0 {
top:
%1 = alloca i256, align 8
call void @j_String31_735(i256* noalias nocapture nonnull sret(i256) %1, {}* inttoptr (i64 7549763600 to {}*)) #0
%2 = load i256, i256* %1, align 8
store i256 %2, i256* %0, align 8
ret void
}
This is not merely superficial cosmetics. There are real practical and tangible implications of using these macros.
The second consideration, not offered in this pull request, is changing how InlineStrings are displayed (as opposed to printed).
At the moment inline strings are displayed exactly like their String
counterparts. One cannot tell the type of string from the display below.
julia> String15("Hello world")
"Hello world"
It would be clearer that the result is an inline string if this were displayed as if it was a literal.
julia> String15("Hello world")
inline"Hello world"15
This would make the display of the inline string types homoiconic.
Regular expression strings provide a precedence for this.
julia> r"Hello"
r"Hello"
On the display change idea, i wouldn't be in favour of changing the default "pretty printing" which shows inline strings the same as regular strings, but i'd support repr(s)
showing how an inline string s
would actually be constructed, like https://github.com/JuliaStrings/InlineStrings.jl/pull/52
Perhaps we could discuss that idea further in an issue or on #52 to not distract from this PR :)
There are real practical and tangible implications of using these macros.
This is true. Sorry for not being precise. My question was in what cases these tangible benefits matter. I.e. do we have use cases where these savings matter? It would have to be cases where there are millions of such literals (most likely such a literal is defined in a function that is called millions of times). Do they happen in practice?
Thinking of this for example using a pattern:
f1(x) = x == String7("abcdefg")
f2(x) = x == inline"abcdefg"
And here f2
will be indeed faster than f1
, so it makes sense to have it.
changing how InlineStrings are displayed
This is something that indeed I sometimes users want. However, if we went for this then I would feel that inline"Hello world"15
should mean 15 exactly
not 15 or more
. I understand why in your macro you went for this widening, but the question - again - is if it is needed. In other words in what use cases 15 or more
is better than 15 exactly
. For example if the use-case is pushing to a type stable container then 15 exactly
is I think a better design.
Do they happen in practice?
A practical use case is in situations where allocations are not allowed at all. For example, in the use of GPUCompiler.jl
on GPUs or StaticCompiler.jl
/ StaticTools.jl
on embedded hardware we might to strictly avoid allocations due to the lack of a garbage collector. Despite inline strings themselves not requiring allocations, the current constructors require allocation.
Here we're trying to enable applications which may be quite difficult with the current constructors.
This is something that indeed I sometimes users want. However, if we went for this then I would feel that
inline"Hello world"15
should mean15 exactly
not15 or more
. I understand why in your macro you went for this widening, but the question - again - is if it is needed. In other words in what use cases15 or more
is better than15 exactly
. For example if the use-case is pushing to a type stable container then15 exactly
is I think a better design.
In the case of display, it does mean 15 exactly. The limitation in this package is that only values of 2^N-1
where N
a nonnegative integer are valid constructions. For other values, we cannot guarantee it will be exactly that value. However, it should consistently round up to the same number, the nearest 2^N-1
value greater than the given value. That logic is implemented by InlineStringType
.
julia> using InlineStrings
julia> inline"abcdefg"11
"abcdefg"
julia> typeof(ans)
String15
julia> inline"hello"11
"hello"
julia> typeof(ans)
String15
Ah - now I have looked at implementation. Sorry for not checking this earlier. So you guarantee type stability here when you allow passing a number. So I am in favor of adding these macros.
What I would discuss though if it were not better to just define macros: inline1
, inline3
, inline7
, inline15
, inline31
, inline63
, inline127
, and inline255
instead of allowing any number - which can be confusing. (+ inline
macro that would be flexible)
See even at our discussion - when I was saying "exactly" I meant that for a single number passed it is always the same InlineStringX
type would be picked (which is what you propose to do and I was not precise enough) while you understood I mean "exactly that number of codeunits".
In that case, perhaps inline"<str>"N
should error if N
is not 2^n-1
? Then we only need one macro.
We could. Let us wait for others to comment as this is a matter of style. I felt that having a suffix after a string (especially if the string were long) is harder to read and having several more macros would not that big problem.
Thank you for submitting this PR.
yeah, I agree on preferring the inline3
/inline5
/etc names instead of the trailing number args. As mentioned, these numbers are fixed, so it makes sense to me to just bake them into the macro names. I'm also fine adding an inline"..."
form that would figure out the size automatically so you don't have to be counting characters all the time.
I made individual macros for 2^N-1
julia> inline1"a"|> typeof
String1
julia> inline3"a"|> typeof
String3
julia> inline7"a"|> typeof
String7
julia> inline15"a"|> typeof
String15
julia> inline31"a"|> typeof
String31
julia> inline63"a"|> typeof
String63
julia> inline127"a"|> typeof
String127
julia> inline255"a"|> typeof
String255
can we drop the inline"str"N
macros (with the N
suffix)?
Is it weird that the canonical type is called String15
but the macro is inline15
?
can we drop the
inline"str"N
macros (with theN
suffix)?
Hmmm, yeah, it doesn't seem like this form is as useful now that we have inline15"..."
. Technically, the N form would allow you to do inline"abc"10
, but I'm not sure that's useful at all? i.e. you'd still end up with an InlineString15
, but you have to statically specify 10
to get it to pick InlineString15
. So I'm in favor of removing.
Is it weird that the canonical type is called String15 but the macro is inline15?
Yeah, we could consider naming these string15"..."
, but then we have the inconsistency of having inline"..."
, which I personally believe will be the most commonly used of all these. So for that reason, I think I prefer staying with inline15"..."
.
i also prefer inline"
as the name, and even tend to use the more vebose InlineString
names, as it allows grepping for "inline" in a codebase which i've found useful more than once.
Thanks again for the PR -- now that things are done at macro-expansion time and can help avoid allocations i think this is a really nice addition!
The N
suffix makes it a lot easier to build a second layer macros or metaprogramming on top of this. Consider the following:
julia> x = 15
15
julia> quote
@inline_str "Hello World" $x
end
quote
#= REPL[35]:2 =#
inline"Hello World"15
end
julia> eval(ans)
"Hello World"
julia> typeof(ans)
String15
The only alternative I know of is to build the macro call expression:
julia> x = 15
15
julia> e = Expr(:macrocall, Symbol("@inline$(x)_str"), LineNumberNode(0), "Hello world")
:(inline15"Hello world")
julia> eval(e)
"Hello world"
julia> typeof(ans)
String15
Yes, but the question is when would one need them in practice (as I have commented - these macros can be added later if they are needed). However, if you indeed need then now then I think they can be added.
I suppose we should export all of the new macro forms. edit: Already done.
This creates a string macro to aid in inline string construction.