JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
44.97k stars 5.43k forks source link

`escape_string` doesn't appear to be able to keep double-quotes. #54511

Open ssfrr opened 1 month ago

ssfrr commented 1 month ago

I may be mis-understanding how this is supposed to work, but it doesn't seem that I can prevent escape_string from escaping double-quote characters.

julia> s = "\"" # create a string with just a literal `"` character
"\""

julia> escape_string(s) # works as expected
"\\\""

julia> escape_string(s; keep=('"',)) # I would have expected my original string to be unchanged here
"\\\""

Note it works as I'd expect for \n:

julia> s2 = "\n"
"\n"

julia> escape_string(s2)
"\\n"

julia> escape_string(s2; keep=('\n',))
"\n"

From nightly as of May 17:

Julia Version 1.12.0-DEV.553
Commit cf940722b65 (2024-05-17 18:24 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M2
  WORD_SIZE: 64
  LLVM: libLLVM-17.0.6 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
omus commented 1 month ago

This is because escape_string(s; keep=('"',)) is actually escape_string(s, ('"',); keep=('"',)) and the esc argument takes precedence over keep. You can work around this via:

julia> escape_string(s, (); keep=('"',))
"\""

This does seem to be something that should be at a minimum fixed in the documentation.

ssfrr commented 1 month ago

ah, I see, thanks. That was not what I expected, as I figured the 2nd positional arg represented the set of additional characters that would be escaped, on top of the default ones.

Seelengrab commented 1 month ago

This does seem to be something that should be at a minimum fixed in the documentation.

It is mentioned in the docstring:

The optional esc argument specifies any additional characters that should also be escaped by a prepending backslash (" is also escaped by default in the first form).

The argument keep specifies a collection of characters which are to be kept as they are. Notice that esc has precedence here.

Do you have suggestions on how this could be made more explicit?

ssfrr commented 1 month ago

Ah, thanks, I guess I missed or misunderstood that part of the docstring.

I think that part that's not super clear in the docstring is that \n and " are both escaped by default, but are handled differently.

\n and friends are sort of "implicit" defaults, and you can prevent them from being escaped by adding them to keep. " is different in that:

  1. it's only escaped if you're returning a string instead of writing to an IO stream
  2. if you want to prevent it from being escaped you need to remove it from esc rather than adding it to keep (@omus it turns out you don't actually need to add it to keep at all).

Can anyone shed some light as to why it's implemented this way? I'm happy to take a crack at clarifying the docstring, but I think I'd want to include some extra context for the why.

omus commented 1 month ago

@omus it turns out you don't actually need to add it to keep at all

True. I thought it was clearer to include " in keep in addition to having to exclude it from esc.