Closed cocowalla closed 6 years ago
Yeah you are right - this needs sorting.
Firstly the doc should say
Not just A-Z, a-z or Digit. Because it actually does Char.IsLetterOrDigit()
. So in your example for 你好!the first two of those characters return true for the IsLetterOrDigit
check so they are currently allowed.
The last character however !is categorised as unicode punctuation
and so fails the current check for a valid literal, but yes it is valid for a filename.
However taken into a wider context - I don't think this "allowed literal characters" limitation is really helping anything much and I think I should just remove it. Like you say the set of allowed characters will differ per platform and that's not something I want to get into really.
This originally evolved because I wanted to identify if a character was a literal, and I thought if there was a small subset / array I could check in that pretty quickly. But actually the better approach seems to be parse for literals last
after checking for other kinds of tokens, and then assume that as its not any other kind of token, then it must be a literal that remains. This way you only need to identify that the next character is not any other token rather than the next character is in a list of known good literal characters. The two checks are roughly the same performance wise.
So here is what I think I should do:
AllowInvalidPathCharacters
.The default behaviour will then just be that the character will be assumed to be a literal if it isn't parsed as any other token first - which is how AllowInvalidPathCharacters
= true behaves.
Yep, treating anything that isn't a special character as a literal makes sense to me
Thanks @cocowalla this will be releases in 2.1.0
The readme says:
Maybe I'm misunderstanding this section, but on all of Windows, Linux and MacOS, lot's of other characters are valid in file system paths, such as:
{
}
[
]
(
)
+
;
%
?
*
Also, on Windows
:
is not valid.