bcpierce00 / unison

Unison file synchronizer
GNU General Public License v3.0
3.86k stars 224 forks source link

Accept @ and % in the username part of ssh roots #980

Closed tleedjarv closed 7 months ago

tleedjarv commented 7 months ago

The easy fix for #979. I think the username regexp is still too restrictive (and I don't think we need or want all the restrictions of RFC 3986) but let's deal with one thing at a time.

The patch accepts both the @ and % characters. Note that this patch does not parse the percent-escapes in any way, it simply passes them forward to the remote shell program.

Unrelated to this PR, it is also possible to pass any username directly to the remote shell program by setting the sshargs preference (for example, -l username for ssh). This is a potential workaround for usernames not accepted by the very strict regexp.

Fixes #979

gdt commented 7 months ago

Direct "@" in the user-info part of the URI, as i read the RFC, is a spec violation. Agreed on adding %.

tleedjarv commented 7 months ago

Direct "@" in the user-info part of the URI, as i read the RFC, is a spec violation. Agreed on adding %.

So it may be but this RFC is not the spec here. The change is for the root specification and while it may look similar to the RFC URI, it is not one nor should it be treated as one. (As I wrote before, there are even more artificial limitations that should be lifted, just not now.)

gdt commented 7 months ago

We probably should document that they aren't URIs then, because someone who knows about URIs would obviously jump to the conclusion that something that looks like a URI and is being used exactly as one would use a URI is actually one.

But really, since these look and act like URIs, why isn't it a bug that they aren't, and that instead we have a weak specification that is almost duplicative?

(And agreed that lots of characters are not allowed when both the URI spec and reason say that they should be, and that fixing the more pressing issue, which is people suffering from MS being unreasonable, should be separated from a full character cleanup.)

tleedjarv commented 7 months ago

Well, you pretty much stated the reason already, didn't you :)

There simply is nothing to be gained by conforming fully to the RFC spec, yet there is a lot to lose when one is not using a very small subset of ASCII.

The main reason why it shouldn't be an "official" URI is precisely due to the very limited set of allowed unencoded characters in the RFC spec. In my view, a human user should never be forced to use the percent encoding (and I'm somewhat confident that browsers, for example, don't force the user to do it). This is true in general, but especially so in this context where there is no cross-anything compatibility required. And while we may argue that this is not a major problem for usernames, it is a huge deal for paths.