Closed pragdave closed 6 years ago
Thanks @pragdave, I agree that considering UTF-8 the warning is not good enough.
There are two reasons for the warning:
The first one, as linked by @fertapric, is that the use of quotes is often confusing, leading developers to think we will have string keys, while they are still atoms
The second one is that there may be a difference in meaning between :josé
and :"josé"
, as outlined in our Unicode Syntax document. The former can only be written in a single way (NFC format) while the latter can be written in multiple ways according to the Unicode spec, which can be the source of confusion. This means that, based on the visual representation, :josé
is always equal to :josé
but :"josé"
is not always equal to something written as :"josé"
due to the multiple ways characters can be written in Unicode. So not having the quotes removes ambiguity and potencial confusion, which is why we advise removing them.
My preference would be to adjust the warning to provide those insights. Suggestions are welcome. But I am fine to restrict the warning only to ascii characters so we still address #7634.
Given that atoms are the names of things, I wonder if a more consistent alternative would be to enforce that atoms must be in NFC format, even if quoted. Would that be a hardship for any real-world code?
Or, looking at in another way, is it any worse than
@pragdave right, we are mirroring strings precisely. Once the atoms are quoted, they can be as bad as strings.
Also, not having quotes mimics identifiers, where you can have an identifier called oɟuı_ʇuǝuodɯoɔ
, as long as it is in NFC format. You can't quote identifiers though. So they are always in NFC.
Do macros normalize all strings before emiting their result? If not, then the issue is still there.
I'm, just wondering how often this actually is a problem.
As for the error, why not just issue it if the string isn't normalized (so
str != String.normalize(string, :nfc)
). Then the error could be "atoms must
follow the same rules as identifiers, and be normalized Unicode"
On Fri, Sep 14, 2018 at 12:33 PM José Valim notifications@github.com wrote:
Also, not having quotes mimics identifiers, where you can have an identifier called oɟuı_ʇuǝuodɯoɔ, as long as it is in NFC format. You can't quote identifiers though. So they are always in NFC.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elixir-lang/elixir/issues/8197#issuecomment-421430368, or mute the thread https://github.com/notifications/unsubscribe-auth/AAApmKKSTQIBUUENFG5x543pLlTdtpvVks5ua-hygaJpZM4WobbN .
Do macros normalize all strings before emiting their result?
No, we don't change the string/atom normalization ever. It is going to be same as the source. But I don't see how macros are part of this discussion?
As for the error, why not just issue it if the string isn't normalized
We shouldn't normalize strings, for some reasons users may be dependent on that. Do you mean to normalize atoms? I am actually OK with requiring atoms to always be normalized NFC. Less source for confusion.
But keep in mind that we still remove the quote for ascii only atoms when possible because of issue 1) above.
As for the error, why not just issue it if the string isn't normalized
I mean the string in :"string"
But keep in mind that we still remove the quote for ascii only atoms when possible because of issue 1) above.
If you normalize a quoted atom, and unquoted atoms must already be normalized, then haven't we removed the ambiguity? So the ASCII issue goes away, because ASCII is always normalized.
I am on my phone Dave but if you re-read my first reply, there are actually two reasons for removing the quotes. I am referring to the first one, which we haven’t discussed yet, and @fertapric linked to. --
José Valimwww.plataformatec.com.br http://www.plataformatec.com.br/Founder and Director of R&D
Ah, so the problem here is that the "string" in :"string"
doesn't actually mean "string literal that I'm converting to an atom at compile time". It's really just a delimiter. So perhaps the solution to the original problem is to move away from the :"string"
syntax to make that clear. Perhaps ~:/my atom name/
?
FWIW I personally don't find "my key": "my value"
confusing, and no one in my classes has either. But that's just a small sample.
I'm happy to close this, as I think all the bases have been covered.
FWIW I personally don't find "my key": "my value" confusing, and no one in my classes has either. But that's just a small sample.
It does come up all the time, unfortunately.
We still have a decision to make here: which is to not warn on quotes for non-ascii atoms. I think we can do this without affecting 1. Do you think we should go ahead with this?
Or even better. Assuming we will stay as is, how would you rewrite the warning so it is less confusing?
Dave, I pushed a better message, let me know if it is clearer now! Thanks for the feedback!
Given
The compiler warns with:
Two things:
The idea of "foreign characters" probably doesn't mean much, as we now support any UTF letter chars in atoms.
Why does the compiler care? If I choose to add quotes where they're not needed, then I guess the formatter could remove them, but as it makes no different to the semantics, I don't think it helps to warn. It's as if the compiler warns for
and so on.