gitgitgadget / git

GitGitGadget's Git fork. Open Pull Requests here to submit them to the Git mailing list
https://gitgitgadget.github.io/
Other
208 stars 135 forks source link

http: do not ignore proxy path #1767

Closed rhendric closed 3 weeks ago

rhendric commented 1 month ago

cc: Ryan Hendrickson ryan.hendrickson@alum.mit.edu cc: Jeff King peff@peff.net

gitgitgadget[bot] commented 1 month ago

Welcome to GitGitGadget

Hi @rhendric, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that either:

You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <revi.ewer@example.com>, Ill Takalook <ill.takalook@example.net>

NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description, because it will result in a malformed CC list on the mailing list. See example.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

dscho commented 1 month ago

/allow

gitgitgadget[bot] commented 1 month ago

User rhendric is now allowed to use GitGitGadget.

WARNING: rhendric has no public email address set on GitHub; GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use.

rhendric commented 1 month ago

/submit

gitgitgadget[bot] commented 1 month ago

Submitted as pull.1767.git.1722009170590.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1767/rhendric/rhendric/http-proxy-path-v1

To fetch this version to local tag pr-1767/rhendric/rhendric/http-proxy-path-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1767/rhendric/rhendric/http-proxy-path-v1
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Ryan Hendrickson via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
>
> The documentation for `http.proxy` describes that option, and the
> environment variables it overrides, as supporting "the syntax understood
> by curl". curl allows SOCKS proxies to use a path to a Unix domain
> socket, like `socks5h://localhost/path/to/socket.sock`. Git should
> therefore include, if present, the path part of the proxy URL in what it
> passes to libcurl.
>
> Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
> ---
>     http: do not ignore proxy path
>     
>      * Documentation: do I need to add anything?

http.proxy documentation says

  The syntax thus is [protocol://][user[:password]@]proxyhost[:port].

but the updated code pays attention to what can come after the
"host[:post]" part, does it not?

> diff --git a/http.c b/http.c
> index 623ed234891..0cd75986a6b 100644
> --- a/http.c
> +++ b/http.c

I am unfamiliar with this code path, so let me follow along aloud.

> @@ -1265,7 +1265,13 @@ static CURL *get_curl_handle(void)
>       if (!proxy_auth.host)
>           die("Invalid proxy URL '%s'", curl_http_proxy);

We grabbed the value from the configuration variable (or various
environment variables like $http_proxy) in the curl_http_proxy
variable, and then passed it to credential_from_url() to parse parts
out of [protocol://][user[:password]@]proxyhost[:port][/p/a/t/h].
The parsed result is in proxy_auth struct and there is no hope it
can be usable if the .host member is missing.

> -     curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);

We used to only use the .host member but the curl_http_proxy could
have had a path after it, held in the .path member.

> +     if (proxy_auth.path) {
> +         struct strbuf proxy = STRBUF_INIT;
> +         strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
> +         curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
> +         strbuf_release(&proxy);
> +     } else
> +         curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);

Style.  If "if" side needs {braces} because it consists of multiple
statements, the corresponding "else" side should also have {braces}
around its body, even if it only has a single statement.

If you have the proxy strbuf in a bit wider scope, then the above becomes

    if (proxy_auth.path)
        strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
    else
        strbuf_addstr(&proxy, proxy_auth.host);
    curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
    strbuf_release(&proxy);

which might (I am not decided) be easier to follow.

Could existing users have been taking advantage of the fact that the
extra /path at the end of http.proxy (and $http_proxy and friends)
are ignored?  For them, this change will appear as a regression.

Other than that (and the lack of any documentation and test
updates), looking good.

Thanks.
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-07-26 09:29-0700, Junio C Hamano <gitster@pobox.com> sent:

> http.proxy documentation says
>
>  The syntax thus is [protocol://][user[:password]@]proxyhost[:port].
>
> but the updated code pays attention to what can come after the
> "host[:post]" part, does it not?

Correct; I'll add a "[/path]" to that construction.

>> +        if (proxy_auth.path) {
>> +            struct strbuf proxy = STRBUF_INIT;
>> +            strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
>> +            curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
>> +            strbuf_release(&proxy);
>> +        } else
>> +            curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
>
> Style.  If "if" side needs {braces} because it consists of multiple
> statements, the corresponding "else" side should also have {braces}
> around its body, even if it only has a single statement.
>
> If you have the proxy strbuf in a bit wider scope, then the above becomes
>
>   if (proxy_auth.path)
>       strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
>   else
>       strbuf_addstr(&proxy, proxy_auth.host);
>   curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
>   strbuf_release(&proxy);
>
> which might (I am not decided) be easier to follow.

For what it's worth, I was following the precedent set by the if-statement starting at line 1256 (a few lines above this patch):

>       if (strstr(curl_http_proxy, "://"))
>           credential_from_url(&proxy_auth, curl_http_proxy);
>       else {
>           struct strbuf url = STRBUF_INIT;
>           strbuf_addf(&url, "http://%s", curl_http_proxy);
>           credential_from_url(&proxy_auth, url.buf);
>           strbuf_release(&url);
>       }

<https://github.com/git/git/blob/ad57f148c6b5f8735b62238dda8f571c582e0e54/http.c#L1256>

I have no problem with being inconsistent with surrounding code in these style choices; just let me know what I should do in light of that.

> Could existing users have been taking advantage of the fact that the
> extra /path at the end of http.proxy (and $http_proxy and friends)
> are ignored?  For them, this change will appear as a regression.

That is possible, though I have difficulty imagining a scenario in which it would be intentional.

What do you recommend I do about that possibility?

R
gitgitgadget[bot] commented 1 month ago

User Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> has been added to the cc: list.

gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

> For what it's worth, I was following the precedent set by the
> if-statement starting at line 1256 (a few lines above this patch):

It is worth nothing.  Existing violation does not make it a good
idea to add more of them.  It makes it more work to clean them all
up later.

>> Could existing users have been taking advantage of the fact that the
>> extra /path at the end of http.proxy (and $http_proxy and friends)
>> are ignored?  For them, this change will appear as a regression.
>
> That is possible, though I have difficulty imagining a scenario in
> which it would be intentional.
>
> What do you recommend I do about that possibility?

I have no idea.  I already said that I am not familiar with this
code path, and it is likely I am a worse person than you are to find
a potential creative (ab)uses of the mechanism to take advantage of
the fact that the current code ignores the path part.

Having said that, given that http.proxy falls back to environment
variables that have been established a lot longer than Git itself,
like HTTPS_PROXY etc., if we _know_ that setting HTTPS_PROXY to such
a value with /path part causes curl (or other popular programs) to
try using such a value without stripping the path part (and even
better if that causes them to fail), then such an observation would
make a very good argument why it is extremely unlikely that existing
users used such a setting, hence it is unlikely this patch would
make a regression.

Thanks.
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Jeff King wrote (reply to this):

On Fri, Jul 26, 2024 at 03:52:50PM +0000, Ryan Hendrickson via GitGitGadget wrote:

>      * Tests: I could use a pointer on how best to add a test for this.
>        Adding a case to t5564-http-proxy.sh seems straightforward but I
>        don't think httpd can be configured to listen to domain sockets; can
>        I use netcat?

I don't offhand know of a way to test this without a custom program like
netcat. If it's the only option, it's OK to use tools that might not be
available everywhere as long as the tests are marked as optional with
the appropriate prerequisite. You can find prior art by looking for
test_lazy_prereq calls (e.g., the ones for GZIP or ZIPINFO are pretty
straight-forward).

I would warn that there are several not-quite-compatible variants of
netcat floating around, which can create headaches. You might be better
off with a short perl script using IO::Socket::UNIX or similar.

> diff --git a/http.c b/http.c
> index 623ed234891..0cd75986a6b 100644
> --- a/http.c
> +++ b/http.c
> @@ -1265,7 +1265,13 @@ static CURL *get_curl_handle(void)
>       if (!proxy_auth.host)
>           die("Invalid proxy URL '%s'", curl_http_proxy);
>  
> -     curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
> +     if (proxy_auth.path) {
> +         struct strbuf proxy = STRBUF_INIT;
> +         strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
> +         curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
> +         strbuf_release(&proxy);
> +     } else
> +         curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);

The fields in the proxy_auth struct have been parsed from the url, with
any url encoding removed. But then we paste them back together into a
pseudo-url without doing any further encoding. Is that correct?

I doubt that the host contains a "/", but if you had a path that
contained a "%", then the URL form of that is going to be %25. Which is
curl expecting to get here?

I say "pseudo-url" because it is weird to slap a path on the end of the
hostname but leave off the scheme, etc. Which kind of makes me wonder
why we pass proxy_auth.host in the first place, and not simply the
original curl_http_proxy. It looks like a weird interaction between
372370f167 (http: use credential API to handle proxy authentication,
2016-01-26) and 57415089bd (http: honor empty http.proxy option to
bypass proxy, 2017-04-11). The former added a _second_ CURLOPT_PROXY
call, and the latter removed the first one.

I wonder if we could go back to passing the string straight to curl (as
we did prior to 2016), and keeping the proxy_auth struct purely as a
mechanism for gathering credentials. That would presumably fix your use
case. And this is also perhaps an interesting data point for Junio's
question about regressions (if people were passing paths, it used to
work and was broken in 2016).

-Peff
gitgitgadget[bot] commented 1 month ago

User Jeff King <peff@peff.net> has been added to the cc: list.

gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-07-26 17:11-0400, Jeff King <peff@peff.net> sent:

> I would warn that there are several not-quite-compatible variants of
> netcat floating around, which can create headaches. You might be better
> off with a short perl script using IO::Socket::UNIX or similar.

Ah, okay, thanks for the pointer!

>> diff --git a/http.c b/http.c
>> index 623ed234891..0cd75986a6b 100644
>> --- a/http.c
>> +++ b/http.c
>> @@ -1265,7 +1265,13 @@ static CURL *get_curl_handle(void)
>>          if (!proxy_auth.host)
>>              die("Invalid proxy URL '%s'", curl_http_proxy);
>>
>> -        curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
>> +        if (proxy_auth.path) {
>> +            struct strbuf proxy = STRBUF_INIT;
>> +            strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path);
>> +            curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
>> +            strbuf_release(&proxy);
>> +        } else
>> +            curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
>
> The fields in the proxy_auth struct have been parsed from the url, with
> any url encoding removed. But then we paste them back together into a
> pseudo-url without doing any further encoding. Is that correct?
>
> I doubt that the host contains a "/", but if you had a path that
> contained a "%", then the URL form of that is going to be %25. Which is
> curl expecting to get here?

Oh, I see! Yes, this is an issue with my patch: if I create a socket file named "%30", command-line curl wants http_proxy to contain "%2530", and patched Git wants http_proxy to contain "%252530". Good edge case to put in a test.

> I wonder if we could go back to passing the string straight to curl (as
> we did prior to 2016), and keeping the proxy_auth struct purely as a
> mechanism for gathering credentials.

Hmm, that would be nice, but I think curl doesn't deal well with the extra case that Git supports of specifying a username but no password. It causes one of the existing tests to fail if I pass the string straight through.

On top of that, all of those starts_with tests for checking the protocol are written quite loosely, so in practice Git "supports" the protocols "socks://" and "socksonmyfeet://" by mapping them both to SOCKS4, and curl would not like it if it received those strings directly.

So given that Git wants to handle the protocol and the credentials, it makes sense that only the host and the path are passed to curl. I just have to make sure that they are correctly re-encoded.

R
gitgitgadget[bot] commented 1 month ago

There are issues in commit b445c4c42cfc4699c2e66852b4a5cb2448165a96: DROP ME: trying to discover why this test fails in CI Commit checks stopped - the message is too short Commit not signed off

rhendric commented 1 month ago

/submit

gitgitgadget[bot] commented 1 month ago

Submitted as pull.1767.v2.git.1722062682858.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1767/rhendric/rhendric/http-proxy-path-v2

To fetch this version to local tag pr-1767/rhendric/rhendric/http-proxy-path-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1767/rhendric/rhendric/http-proxy-path-v2
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Jeff King wrote (reply to this):

On Fri, Jul 26, 2024 at 06:43:29PM -0400, Ryan Hendrickson wrote:

> > I wonder if we could go back to passing the string straight to curl (as
> > we did prior to 2016), and keeping the proxy_auth struct purely as a
> > mechanism for gathering credentials.
> 
> Hmm, that would be nice, but I think curl doesn't deal well with the extra
> case that Git supports of specifying a username but no password. It causes
> one of the existing tests to fail if I pass the string straight through.

OK, thanks for trying. It would have been nice, but I'm not surprised
that there's an unusual interaction.

> On top of that, all of those starts_with tests for checking the protocol are
> written quite loosely, so in practice Git "supports" the protocols
> "socks://" and "socksonmyfeet://" by mapping them both to SOCKS4, and curl
> would not like it if it received those strings directly.
>
> So given that Git wants to handle the protocol and the credentials, it makes
> sense that only the host and the path are passed to curl. I just have to
> make sure that they are correctly re-encoded.

Makes sense.

-Peff
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Jeff King wrote (reply to this):

On Sat, Jul 27, 2024 at 06:44:42AM +0000, Ryan Hendrickson via GitGitGadget wrote:

>     Re earlier discussion about regressions, it turns out that curl has only
>     supported this syntax since 2022 [https://curl.se/ch/7.84.0.html].
>     Earlier versions of curl, if provided an http_proxy that contained a
>     path, would also ignore it. curl didn't seem to consider this a breaking
>     change when the feature was introduced
>     [https://github.com/curl/curl/pull/8668], though; is that a valid
>     argument for Git not to lose sleep over it either?

IMHO it is probably OK. The path did not do anything before then, so why
would anybody pass it?

Looking into the curl support, I did notice two things:

  - unix sockets are only supported for socks proxies. Is it meaningful
    to have a path on an http proxy? I don't think so (there is no place
    to send it in the "GET" line). But perhaps we should limit the new
    code only to socks.

  - curl says you should put "localhost" in the host field for this,
    though obviously it is not otherwise used. Should we likewise
    require that to kick in the code?

> @@ -1265,7 +1266,24 @@ static CURL *get_curl_handle(void)
>       if (!proxy_auth.host)
>           die("Invalid proxy URL '%s'", curl_http_proxy);
>  
> -     curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
> +     strbuf_addstr(&proxy, proxy_auth.host);
> +     if (proxy_auth.path) {

There's one gotcha here I wondered about: we usually throw away the path
element, unless credential.useHTTPPath is set. That only happens after
we load the per-credential config, though, via credential_apply_config(),
which in turn is triggered by a call to credential_fill(), etc.

I think this is OK here, since we have just called credential_from_url()
ourselves, and the earliest credential_fill() we'd hit is from
init_curl_proxy_auth(), which is called right after the code you're
touching.

> +         int path_is_supported = 0;
> +         /* curl_version_info was added in curl 7.10 */
> +#if LIBCURL_VERSION_NUM >= 0x070a00
> +         curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW);
> +         path_is_supported = ver->version_num >= 0x075400;
> +#endif

We've been trying to keep our curl version #ifdefs in git-curl-compat.h,
with human-readable names. But in this case, I think we can ditch it
entirely, as we require curl 7.21.3 or later already.

This will be the first time we check curl_version_info() instead of a
build-time option. I think that makes sense here. Most features require
extra information at build-time (e.g., CURLOPT_* constants), but in this
case it is purely a check on the runtime behavior.

We perhaps could get away with just letting an older curl quietly ignore
the path field, but what you have here is probably a bit friendlier for
users.

> diff --git a/t/socks5-proxy-server/src/socks5.pl b/t/socks5-proxy-server/src/socks5.pl
> new file mode 100755
> index 00000000000..3ef66a1a287
> --- /dev/null
> +++ b/t/socks5-proxy-server/src/socks5.pl

This is a lot more code than I was hoping for. There are definitely
parts of it we don't care about (e.g, threading, handling multiple
connections at once) that could be pared down a bit. I don't think we
particularly care about auth either (we just want to make sure we talk
to the unix-socket proxy at all).

What if we switched to socks4, which drops all of the auth bits and
supports only IPs, not hostnames (that came in socks4a). This is the
shortest I could come up with (only lightly tested):

-- >8 --
use strict;
use IO::Select;
use IO::Socket::UNIX;
use IO::Socket::INET;

my $path = shift;

unlink($path);
my $server = IO::Socket::UNIX->new(Listen => 1, Local => $path)
    or die "unable to listen on $path: $!";

$| = 1;
print "ready to rumble\n";

while (my $client = $server->accept()) {
    sysread $client, my $buf, 8;
    my ($version, $cmd, $port, $ip) = unpack 'CCnN', $buf;
    next unless $version == 4; # socks4
    next unless $cmd == 1; # TCP stream connection

    # skip NUL-terminated id
    while (sysread $client, my $char, 1) {
        last unless ord($char);
    }

    # version(0), reply(5a == granted), port (ignored), ip (ignored)
    syswrite $client, "\x00\x5a\x00\x00\x00\x00\x00\x00";

    my $remote = IO::Socket::INET->new(PeerHost => $ip, PeerPort => $port)
        or die "unable to connect to $ip/$port: $!";

    my $io = IO::Select->new($client, $remote);
    while ($io->count) {
        for my $fh ($io->can_read(0)) {
            for my $pair ([$client, $remote], [$remote, $client]) {
                my ($from, $to) = @$pair;
                next unless $fh == $from;

                my $r = sysread $from, my $buf, 1024;
                if (!defined $r || $r <= 0) {
                    $io->remove($from);
                    next;
                }
                syswrite $to, $buf;
            }
        }
    }
}
-- >8 --

That's still more than I'd like, but quite a bit smaller. I dunno.
Probably the one you found is more battle-tested, but I really think we
have pretty simple requirements.

> diff --git a/t/t5564-http-proxy.sh b/t/t5564-http-proxy.sh
> index bb35b87071d..bafaa31adf8 100755
> --- a/t/t5564-http-proxy.sh
> +++ b/t/t5564-http-proxy.sh
> @@ -39,4 +39,22 @@ test_expect_success 'clone can prompt for proxy password' '
>   expect_askpass pass proxuser
>  '
>  
> +start_socks() {
> + # The %30 tests that the correct amount of percent-encoding is applied
> + # to the proxy string passed to curl.
> + "$PERL_PATH" "$TEST_DIRECTORY/socks5-proxy-server/src/socks5.pl" \
> +     "$TRASH_DIRECTORY/%30.sock" proxuser proxpass &
> + socks_pid=$!
> +}
> +
> +test_lazy_prereq LIBCURL_7_84 'expr x$(curl-config --vernum) \>= x075400'

This is the first time we'd be relying on curl-config in the test suite.
I suspect it would _usually_ work, but I'd worry about a few things:

  1. The Makefile allows for a different path for $(CURL_CONFIG), or
     even skipping it entirely by setting $(CURLDIR).

  2. We'd usually build and test in the same environment, but not
     necessarily. In particular, I know Windows uses separate CI jobs
     for this, and I'm not sure if curl-config will be around in the
     test jobs.

I can see two paths forward:

  a. There's been recent discussion about adding an option for Git to
     report the run-time curl version. That doesn't exist yet, though
     "git version --build-options" will report the build-time version.
     That might be good enough for the purposes of testing a build.

  b. Write the test such that it expects the request to work _or_ we see
     the "version too old" failure.

> +test_expect_success PERL,LIBCURL_7_84 'clone via Unix socket' '

I'm not sure if this PERL prereq is sufficient. We also need to know
that we can make unix sockets. Probably we need to actually run the
socks proxy and make sure it gets to the "starting..." line before
accepting that it works. Which also gets us to...

> + start_socks &&
> + test_when_finished "rm -rf clone && kill $socks_pid" &&
> + test_config_global http.proxy "socks5://proxuser:proxpass@localhost$PWD/%2530.sock" &&
> + GIT_TRACE_CURL=$PWD/trace git clone "$HTTPD_URL/smart/repo.git" clone &&

This is racy. We started the proxy in the background, but we don't know
that it's ready to accept connections by the time we run "git clone". My
first few runs all failed, but putting a "sleep 1" in between fixed it
(which obviously is not a real fix, but just a debugging aid).

If you usually see success, try running the test script with "--stress"
to create extra load, and you should see failures.

The usual trick here is to start the process in the background, and then
in the foreground wait for some "I'm ready" output, which involves ugly
tricks with fifos. There's prior art in lib-git-daemon.sh (though it's a
little more complicated there, because we want to relay its stderr, too,
but with a script under our control we can just put the ready message on
stdout).

So coupled with the prereq fix that I mentioned above, you might be able
to do something like (totally untested):

  start_socks_proxy () {
    mkfifo socks_output &&
    {
        perl proxy.pl ... >socks_output &
        socks_pid=$!
    } &&
    read line <socks_output &&
    test "$line" = "Starting..."
  }

  test_expect_success 'try to start socks proxy' '
    if start_socks_proxy
    then
        test_seq_prereq SOCKS_PROXY
    fi
  '

  test_expect_success SOCKS_PROXY 'clone via unix socket' '
    ...
  '

-Peff
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-07-29 16:09-0400, Jeff King <peff@peff.net> sent:

> Looking into the curl support, I did notice two things:
>
>  - unix sockets are only supported for socks proxies. Is it meaningful
>    to have a path on an http proxy? I don't think so (there is no place
>    to send it in the "GET" line). But perhaps we should limit the new
>    code only to socks.
>
>  - curl says you should put "localhost" in the host field for this,
>    though obviously it is not otherwise used. Should we likewise
>    require that to kick in the code?

Sounds good, I've added checks and tests for both of these cases.

>> @@ -1265,7 +1266,24 @@ static CURL *get_curl_handle(void)
>>          if (!proxy_auth.host)
>>              die("Invalid proxy URL '%s'", curl_http_proxy);
>>
>> -        curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
>> +        strbuf_addstr(&proxy, proxy_auth.host);
>> +        if (proxy_auth.path) {
>
> There's one gotcha here I wondered about: we usually throw away the path
> element, unless credential.useHTTPPath is set. That only happens after
> we load the per-credential config, though, via credential_apply_config(),
> which in turn is triggered by a call to credential_fill(), etc.
>
> I think this is OK here, since we have just called credential_from_url()
> ourselves, and the earliest credential_fill() we'd hit is from
> init_curl_proxy_auth(), which is called right after the code you're
> touching.

Yes, good lookout but I don't think this is a problem either.

>> +            int path_is_supported = 0;
>> +            /* curl_version_info was added in curl 7.10 */
>> +#if LIBCURL_VERSION_NUM >= 0x070a00
>> +            curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW);
>> +            path_is_supported = ver->version_num >= 0x075400;
>> +#endif
>
> We've been trying to keep our curl version #ifdefs in git-curl-compat.h,
> with human-readable names. But in this case, I think we can ditch it
> entirely, as we require curl 7.21.3 or later already.

Ah, okay, that is good to know! I'll remove the #if.

> This will be the first time we check curl_version_info() instead of a
> build-time option. I think that makes sense here. Most features require
> extra information at build-time (e.g., CURLOPT_* constants), but in this
> case it is purely a check on the runtime behavior.
>
> We perhaps could get away with just letting an older curl quietly ignore
> the path field, but what you have here is probably a bit friendlier for
> users.

Yes, curl has a nasty tendency to quietly ignore a lot of things. I didn't want users to be uncertain about whether they were using the feature incorrectly if an older curl was the actual issue.

> What if we switched to socks4, which drops all of the auth bits and
> supports only IPs, not hostnames (that came in socks4a). This is the
> shortest I could come up with (only lightly tested):

Ah, many thanks! I like this much better, and I'm not proficient enough with Perl to have written it myself.

> I can see two paths forward:
>
>  a. There's been recent discussion about adding an option for Git to
>     report the run-time curl version. That doesn't exist yet, though
>     "git version --build-options" will report the build-time version.
>     That might be good enough for the purposes of testing a build.
>
>  b. Write the test such that it expects the request to work _or_ we see
>     the "version too old" failure.

Got it, I'll go with option b.

> If you usually see success, try running the test script with "--stress"
> to create extra load, and you should see failures.

Huh. I never saw any race issues on my machine even with --stress, but your approach is safer so I'm running with it.

Thank you!

R
rhendric commented 1 month ago

/submit

gitgitgadget[bot] commented 1 month ago

Submitted as pull.1767.v3.git.1722441675945.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1767/rhendric/rhendric/http-proxy-path-v3

To fetch this version to local tag pr-1767/rhendric/rhendric/http-proxy-path-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1767/rhendric/rhendric/http-proxy-path-v3
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Ryan Hendrickson via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -5,8 +5,8 @@ http.proxy::
>   proxy string with a user name but no password, in which case git will
>   attempt to acquire one in the same way it does for other credentials. See
>   linkgit:gitcredentials[7] for more information. The syntax thus is
> - '[protocol://][user[:password]@]proxyhost[:port]'. This can be overridden
> - on a per-remote basis; see remote.<name>.proxy
> + '[protocol://][user[:password]@]proxyhost[:port][/path]'. This can be
> + overridden on a per-remote basis; see remote.<name>.proxy

OK.

> diff --git a/http.c b/http.c
> index 623ed234891..a50ba095889 100644
> --- a/http.c
> +++ b/http.c
> @@ -1227,6 +1227,7 @@ static CURL *get_curl_handle(void)
>        */
>       curl_easy_setopt(result, CURLOPT_PROXY, "");
>   } else if (curl_http_proxy) {
> +     struct strbuf proxy = STRBUF_INIT;
>       if (starts_with(curl_http_proxy, "socks5h"))
>           curl_easy_setopt(result,
>               CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5_HOSTNAME);

In a block with local variable decl, be more friendly to readers by
having a blank line between the end of declarations and the first
statement.

> +     strbuf_addstr(&proxy, proxy_auth.host);
> +     if (proxy_auth.path) {
> +         curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW);
> +         if (ver->version_num < 0x075400)
> +             die("libcurl 7.84 or later is required to support paths in proxy URLs");
> +
> +         if (!starts_with(proxy_auth.protocol, "socks"))
> +             die("Invalid proxy URL '%s': only SOCKS proxies support paths",
> +                 curl_http_proxy);
> +
> +         if (strcasecmp(proxy_auth.host, "localhost"))
> +             die("Invalid proxy URL '%s': host must be localhost if a path is present",
> +                 curl_http_proxy);

We insist that it must be "localhost", so let's not do strcasecmp()
but just do strcmp().

> diff --git a/t/t5564-http-proxy.sh b/t/t5564-http-proxy.sh
> index bb35b87071d..7fcffba67a2 100755
> --- a/t/t5564-http-proxy.sh
> +++ b/t/t5564-http-proxy.sh
> @@ -39,4 +39,50 @@ test_expect_success 'clone can prompt for proxy password' '
>   expect_askpass pass proxuser
>  '
>  
> +start_socks() {
> + mkfifo socks_output &&
> + {
> +     "$PERL_PATH" "$TEST_DIRECTORY/socks4-proxy.pl" "$1" >socks_output &
> +     socks_pid=$!
> + } &&
> + read line <socks_output &&
> + test "$line" = ready
> +}
> +
> +test_expect_success PERL 'try to start SOCKS proxy' '
> + # The %30 tests that the correct amount of percent-encoding is applied
> + # to the proxy string passed to curl.
> + if start_socks %30.sock
> + then
> +     test_set_prereq SOCKS_PROXY
> + fi
> +'

Making it a regular test_expect_success would mean GIT_SKIP_TEST
mechansim can be used to skip it, which is probably not what you
want.  Can't this be a more common test_lazy_prereq, perhaps like

    test_lazy_prereq SOCKS_PROXY '
        # useful comment about 30% here ...
        test_have_prereq PERL &&
        start_socks %30.sock
    '

or something?

Thanks.
gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/4b48f6e8856bfdc7f44abe570f3510fe199b1b74.

gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-07-31 15:24-0700, Junio C Hamano <gitster@pobox.com> sent:

> In a block with local variable decl, be more friendly to readers by
> having a blank line between the end of declarations and the first
> statement.

Will do.

> We insist that it must be "localhost", so let's not do strcasecmp()
> but just do strcmp().

I don't see the wisdom of being more restrictive than curl is in this respect. The documentation makes the claim that curl's syntax is what this option supports, and while that is not literally true due to how protocols are handled, I don't see a reason to intentionally deviate further.

I've tested that curl does the right thing with any casing of localhost, both on its own and via Git. Host names are generally case insensitive; the error message should be understood in that context. And if it is misunderstood, there's no negative impact on the user who writes "localhost" (while there is a negative impact on the user who quite reasonably expects "LOCALHOST" to work if we don't follow curl's lead).

> Making it a regular test_expect_success would mean GIT_SKIP_TEST
> mechansim can be used to skip it, which is probably not what you
> want.  Can't this be a more common test_lazy_prereq, perhaps like
>
>   test_lazy_prereq SOCKS_PROXY '
>       # useful comment about 30% here ...
>       test_have_prereq PERL &&
>       start_socks %30.sock
>   '
>
> or something?

This existing test file is definitely not written with running individual tests in mind; the first test is a prerequisite for all that comes after, and the second test seems to be required for tests 3–5 as well.

But sure, I'll use that pattern if you like.

R
rhendric commented 1 month ago

/submit

gitgitgadget[bot] commented 1 month ago

Submitted as pull.1767.v4.git.1722489776279.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1767/rhendric/rhendric/http-proxy-path-v4

To fetch this version to local tag pr-1767/rhendric/rhendric/http-proxy-path-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1767/rhendric/rhendric/http-proxy-path-v4
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

>> We insist that it must be "localhost", so let's not do strcasecmp()
>> but just do strcmp().
>
> I don't see the wisdom of being more restrictive than curl is in this
> respect.

Ah, if curl is doing case insensitivity, then matching its behaviour
is perfectly fine.  I was just reacting to this message ...

> + die("Invalid proxy URL '%s': host must be localhost if a path is present",

... that results when strcasecmp() does not like it.

Thanks.
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Jeff King wrote (reply to this):

On Wed, Jul 31, 2024 at 03:24:01PM -0700, Junio C Hamano wrote:

> > +start_socks() {
> > +   mkfifo socks_output &&
> > +   {
> > +       "$PERL_PATH" "$TEST_DIRECTORY/socks4-proxy.pl" "$1" >socks_output &
> > +       socks_pid=$!
> > +   } &&
> > +   read line <socks_output &&
> > +   test "$line" = ready
> > +}
> > +
> > +test_expect_success PERL 'try to start SOCKS proxy' '
> > +   # The %30 tests that the correct amount of percent-encoding is applied
> > +   # to the proxy string passed to curl.
> > +   if start_socks %30.sock
> > +   then
> > +       test_set_prereq SOCKS_PROXY
> > +   fi
> > +'
> 
> Making it a regular test_expect_success would mean GIT_SKIP_TEST
> mechansim can be used to skip it, which is probably not what you
> want.  Can't this be a more common test_lazy_prereq, perhaps like
> 
>   test_lazy_prereq SOCKS_PROXY '
>       # useful comment about 30% here ...
>       test_have_prereq PERL &&
>       start_socks %30.sock
>   '
> 
> or something?

I think Ryan picked up this approach from my earlier mail. And the
reason I suggested doing it this way is that the prereq test has an
important side effect: it creates the socket and starts the proxy
server!

I think lazy prereqs only make sense when their only output is the
yes/no of whether we match the prereq. And then we don't care when or
how often they are evaluated. I do think things would work, assuming the
proxy server then can serve multiple tests. It just feels weird, and
doing it more like the git-daemon/http tests made more sense to me
(though those ones do not even do their work inside an expect block).

If we did it in a lazy prereq I think you'd need to munge the path, as
the lazy prereq operates in a throwaway directory (so the %30.sock would
get removed before we can use it in the next test).

-Peff
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Aug 01, 2024 at 05:22:56AM +0000, Ryan Hendrickson via GitGitGadget wrote:

> From: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
> 
> The documentation for `http.proxy` describes that option, and the
> environment variables it overrides, as supporting "the syntax understood
> by curl". curl allows SOCKS proxies to use a path to a Unix domain
> socket, like `socks5h://localhost/path/to/socket.sock`. Git should
> therefore include, if present, the path part of the proxy URL in what it
> passes to libcurl.
> 
> Co-authored-by: Jeff King <peff@peff.net>
> Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>

Thanks for crediting me. I'll add my:

 Signed-off-by: Jeff King <peff@peff.net>

to be explicit that the proxy script is under the DCO.

> -     curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
> +     strbuf_addstr(&proxy, proxy_auth.host);
> +     if (proxy_auth.path) {
> +         curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW);
> +
> +         if (ver->version_num < 0x075400)
> +             die("libcurl 7.84 or later is required to support paths in proxy URLs");
> +
> +         if (!starts_with(proxy_auth.protocol, "socks"))
> +             die("Invalid proxy URL '%s': only SOCKS proxies support paths",
> +                 curl_http_proxy);
> +
> +         if (strcasecmp(proxy_auth.host, "localhost"))
> +             die("Invalid proxy URL '%s': host must be localhost if a path is present",
> +                 curl_http_proxy);
> +
> +         strbuf_addch(&proxy, '/');
> +         strbuf_add_percentencode(&proxy, proxy_auth.path, 0);

This all looks good to me. I wondered briefly whether
proxy_auth.protocol could ever be NULL, but I think we refuse to parse
such a URL.

> +start_socks() {
> + mkfifo socks_output &&
> + {
> +     "$PERL_PATH" "$TEST_DIRECTORY/socks4-proxy.pl" "$1" >socks_output &
> +     echo $! > "$TRASH_DIRECTORY/socks.pid"
> + } &&
> + read line <socks_output &&
> + test "$line" = ready
> +}
> +
> +# The %30 tests that the correct amount of percent-encoding is applied to the
> +# proxy string passed to curl.
> +test_lazy_prereq SOCKS_PROXY 'test_have_prereq PERL && start_socks "$TRASH_DIRECTORY/%30.sock"'

OK, I see you figured out that the lazy prereq requires giving the full
path to the socket. :) I had forgotten that we also run the prereq in a
subshell to avoid side effects, but you caught that, as well.

All of this to me is good evidence that the non-lazy version you had
originally is a better approach. But I don't think it's worth spending
time fighting over, so I'm OK either way.

> +test_atexit 'test ! -e "$TRASH_DIRECTORY/socks.pid" || kill "$(cat "$TRASH_DIRECTORY/socks.pid")"'

Ah, good. I had meant to mention cleaning up at the end (as we do for
git-daemon and apache), but forgot. I'm glad you included it here.

> +test_expect_success SOCKS_PROXY 'clone via Unix socket' '
> + test_when_finished "rm -rf clone" &&
> + test_config_global http.proxy "socks4://localhost$PWD/%2530.sock" && {
> +     {
> +         GIT_TRACE_CURL=$PWD/trace git clone "$HTTPD_URL/smart/repo.git" clone 2>err &&
> +         grep -i "SOCKS4 request granted." trace
> +     } ||
> +     grep "^fatal: libcurl 7\.84 or later" err
> + }
> +'

Looks good. Usually I do not bother with escaping "." in grep messages,
as it is already a loose match and it keeps the test easier to read. But
it is OK to do so.

We do have a test_grep wrapper which gives nicer output when the match
fails, but maybe that is a bad choice here, since we accept either of
the two messages. (Likewise for using test_cmp, I suppose).

The use of "||" in the command-chaining is unusual enough that it's
probably worth calling out either in a comment or in the commit message.

> +test_expect_success 'Unix socket requires socks*:' '
> + ! git clone -c http.proxy=localhost/path https://example.com/repo.git 2>err && {
> +     grep "^fatal: Invalid proxy URL '\''localhost/path'\'': only SOCKS proxies support paths" err ||
> +     grep "^fatal: libcurl 7\.84 or later" err
> + }
> +'
> +
> +test_expect_success 'Unix socket requires localhost' '
> + ! git clone -c http.proxy=socks4://127.0.0.1/path https://example.com/repo.git 2>err && {
> +     grep "^fatal: Invalid proxy URL '\''socks4://127\.0\.0\.1/path'\'': host must be localhost if a path is present" err ||
> +     grep "^fatal: libcurl 7\.84 or later" err
> + }
> +'
> +

Likewise here, I'd probably just match the single-quotes with "." for
readability (but it's OK to write it as you did). You can (since a week
or so ago) also use a here-doc body like:

  test_expect_success 'Unix socket requires socks*:' - <<\EOT
    ...
    test_grep "^fatal: Invalid proxy URL 'localhost/path': only SOCKS proxies support paths" err ||
    ...
  EOT

but I'm OK with it either way. The same "||" comment applies.

-Peff
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

> I think Ryan picked up this approach from my earlier mail. And the
> reason I suggested doing it this way is that the prereq test has an
> important side effect: it creates the socket and starts the proxy
> server!

Ah, OK.

> I think lazy prereqs only make sense when their only output is the
> yes/no of whether we match the prereq. And then we don't care when or
> how often they are evaluated.

Once the prereq test is attempted the result is cached (that's the
whole point of lazy prereq mechanism) so we won't see multiple sock
proxies spawned.

> I do think things would work, assuming the
> proxy server then can serve multiple tests. It just feels weird, and
> doing it more like the git-daemon/http tests made more sense to me
> (though those ones do not even do their work inside an expect block).

OK.  That's fine.  It is probably not a good fit for the pattern.

> If we did it in a lazy prereq I think you'd need to munge the path, as
> the lazy prereq operates in a throwaway directory (so the %30.sock would
> get removed before we can use it in the next test).

True.
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

> On Thu, Aug 01, 2024 at 05:22:56AM +0000, Ryan Hendrickson via GitGitGadget wrote:
>
>> From: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
>> 
>> The documentation for `http.proxy` describes that option, and the
>> environment variables it overrides, as supporting "the syntax understood
>> by curl". curl allows SOCKS proxies to use a path to a Unix domain
>> socket, like `socks5h://localhost/path/to/socket.sock`. Git should
>> therefore include, if present, the path part of the proxy URL in what it
>> passes to libcurl.
>> 
>> Co-authored-by: Jeff King <peff@peff.net>
>> Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
>
> Thanks for crediting me. I'll add my:
>
>  Signed-off-by: Jeff King <peff@peff.net>
>
> to be explicit that the proxy script is under the DCO.

OK, I'll amend it while queuing this v4.

Thanks.

>> +# The %30 tests that the correct amount of percent-encoding is applied to the
>> +# proxy string passed to curl.
>> +test_lazy_prereq SOCKS_PROXY 'test_have_prereq PERL && start_socks "$TRASH_DIRECTORY/%30.sock"'
>
> OK, I see you figured out that the lazy prereq requires giving the full
> path to the socket. :) I had forgotten that we also run the prereq in a
> subshell to avoid side effects, but you caught that, as well.

;-)

> All of this to me is good evidence that the non-lazy version you had
> originally is a better approach. But I don't think it's worth spending
> time fighting over, so I'm OK either way.

I'd be OK either way, too.

Thanks, both.
gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/92bab5172ef2019a72859fe3cc79ef1b3c135e95.

rhendric commented 1 month ago

/submit

gitgitgadget[bot] commented 1 month ago

Submitted as pull.1767.v5.git.1722576007398.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1767/rhendric/rhendric/http-proxy-path-v5

To fetch this version to local tag pr-1767/rhendric/rhendric/http-proxy-path-v5:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1767/rhendric/rhendric/http-proxy-path-v5
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Ryan Hendrickson via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
>
> The documentation for `http.proxy` describes that option, and the
> environment variables it overrides, as supporting "the syntax understood
> by curl". curl allows SOCKS proxies to use a path to a Unix domain
> socket, like `socks5h://localhost/path/to/socket.sock`. Git should
> therefore include, if present, the path part of the proxy URL in what it
> passes to libcurl.
>
> Co-authored-by: Jeff King <peff@peff.net>
> Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

The trailer lines should be ordered in chronological order to record
the provenance of the patch.  The last two entries make it look as
if what you assembled and signed-off was relayed by peff, with or
without further modification, with his sign-off to me, to become the
final version, but that is not the story you want to tell.

I'd swap the two lines (i.e., sign-offs) while queuing.

> +         if (!starts_with(proxy_auth.protocol, "socks"))
> +             die("Invalid proxy URL '%s': only SOCKS proxies support paths",
> +                 curl_http_proxy);

Our error messages that are prefixed with "fatal:" do not typically
begin with a capital letter.

But I'll let it pass, as this copies an existing message in this
file.  #leftoverbits clean-up needs to correct the ones added by
this patch and existing "Invalid proxy URL '%s'" by downcasing
"Invalid", possibly enclose the message in _() for i18n, and also
downcase "C" in two "Could not set SSL ..."  messages.

> +         if (strcasecmp(proxy_auth.host, "localhost"))
> +             die("Invalid proxy URL '%s': host must be localhost if a path is present",
> +                 curl_http_proxy);

Ditto.

Or instead of leaving this for a later clean-up after the dust
settles, we could have a separate "preliminary clean-up" patch to
address existing ones first.  Either is fine, but taking "clean-up
after the dust settles" and leaving it a problem for other people is
probably easier.

> diff --git a/t/t5564-http-proxy.sh b/t/t5564-http-proxy.sh
> index bb35b87071d..0d6cfebbfab 100755
> --- a/t/t5564-http-proxy.sh
> +++ b/t/t5564-http-proxy.sh
> @@ -39,4 +39,53 @@ test_expect_success 'clone can prompt for proxy password' '
>   expect_askpass pass proxuser
>  '
>  
> +start_socks() {
> + mkfifo socks_output &&
> + {
> +     "$PERL_PATH" "$TEST_DIRECTORY/socks4-proxy.pl" "$1" >socks_output &
> +     echo $! > "$TRASH_DIRECTORY/socks.pid"
> + } &&
> + read line <socks_output &&
> + test "$line" = ready
> +}
> +
> +# The %30 tests that the correct amount of percent-encoding is applied to the
> +# proxy string passed to curl.
> +test_lazy_prereq SOCKS_PROXY 'test_have_prereq PERL && start_socks "$TRASH_DIRECTORY/%30.sock"'
> +
> +test_atexit 'test ! -e "$TRASH_DIRECTORY/socks.pid" || kill "$(cat "$TRASH_DIRECTORY/socks.pid")"'

Let's line-wrap these overly long ones.

        # The %30 tests that the correct amount of percent-encoding is
        # applied to the proxy string passed to curl.
        test_lazy_prereq SOCKS_PROXY '
                test_have_prereq PERL &&
                start_socks "$TRASH_DIRECTORY/%30.sock"
        '

        test_atexit '
                test ! -e "$TRASH_DIRECTORY/socks.pid" ||
                kill "$(cat "$TRASH_DIRECTORY/socks.pid")"
        '

> +# The below tests morally ought to be gated on a prerequisite that Git is
> +# linked with a libcurl that supports Unix socket paths for proxies (7.84 or
> +# later), but this is not easy to test right now. Instead, we || the tests with
> +# this function.
> +old_libcurl_error() {
> + grep -Fx "fatal: libcurl 7.84 or later is required to support paths in proxy URLs" "$1"
> +}

Cute.  

This fixes the polarity of the result from the whole "{ test } ||
this helper" construct.  Even if the test proper fails, we will
allow such a failure only when the reason of the failure is due to
this message.

If I were writing this, I would shorten to look for a bit fuzzier
pattern like

    grep "^fatal: .* is required to support paths in proxy URLs" "$1"

as that would allow us to fix the code later without needing to
update the pattern, if we discover reasons, other than being older
than libcURL 7.84, why paths in proxy URLs cannot be supported.

Other than that, very nicely done.

Thanks.
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-08-02 08:52-0700, Junio C Hamano <gitster@pobox.com> sent:

> I'd swap the two lines (i.e., sign-offs) while queuing.

Okay.

> Our error messages that are prefixed with "fatal:" do not typically
> begin with a capital letter.
>
> But I'll let it pass, as this copies an existing message in this
> file.  #leftoverbits clean-up needs to correct the ones added by
> this patch and existing "Invalid proxy URL '%s'" by downcasing
> "Invalid", possibly enclose the message in _() for i18n, and also
> downcase "C" in two "Could not set SSL ..."  messages.
>
> [ . . . ]
>
> Or instead of leaving this for a later clean-up after the dust
> settles, we could have a separate "preliminary clean-up" patch to
> address existing ones first.  Either is fine, but taking "clean-up
> after the dust settles" and leaving it a problem for other people is
> probably easier.

Hmm. I'd be inclined to take the preliminary clean-up approach, but some of the existing strings (there are also two "Unsupported ..."/"Supported ..." strings near the "Could not set..."s) are going through gettext, and I'm reluctant to interfere with the l10n process.

Would a partial clean-up that downcased only the "Invalid proxy URL" message be acceptable, or worse than leaving this as #leftoverbits?

> Let's line-wrap these overly long ones.

Okay.

> If I were writing this, I would shorten to look for a bit fuzzier
> pattern like
>
>    grep "^fatal: .* is required to support paths in proxy URLs" "$1"
>
> as that would allow us to fix the code later without needing to
> update the pattern, if we discover reasons, other than being older
> than libcURL 7.84, why paths in proxy URLs cannot be supported.

Is this blocking feedback? This strikes me as speculative over-engineering; it would not be difficult to generalize the message, and possibly then choose a new, more appropriate function name and update the explanatory comment, when and if such reasons arise.

R
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

> Hmm. I'd be inclined to take the preliminary clean-up approach, but
> some of the existing strings (there are also two "Unsupported
> ..."/"Supported ..." strings near the "Could not set..."s) are going
> through gettext, and I'm reluctant to interfere with the l10n process.

I do not see what you mean by interfering with the localization.

If we are updating text to be translated anyway, giving translators
the strings that need to be translated _earlier_ rather than later
would be more helpful to them, no?

>> If I were writing this, I would shorten to look for a bit fuzzier
>> pattern like
>>
>>    grep "^fatal: .* is required to support paths in proxy URLs" "$1"
>>
>> as that would allow us to fix the code later without needing to
>> update the pattern, if we discover reasons, other than being older
>> than libcURL 7.84, why paths in proxy URLs cannot be supported.
>
> Is this blocking feedback? This strikes me as speculative
> over-engineering

No, it is loosening a pattern that is overly tight and as a side
effect shortening the line to more readable length ;-).
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-08-02 10:10-0700, Junio C Hamano <gitster@pobox.com> sent:

> Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:
>
>> Hmm. I'd be inclined to take the preliminary clean-up approach, but
>> some of the existing strings (there are also two "Unsupported
>> ..."/"Supported ..." strings near the "Could not set..."s) are going
>> through gettext, and I'm reluctant to interfere with the l10n process.
>
> I do not see what you mean by interfering with the localization.
>
> If we are updating text to be translated anyway, giving translators
> the strings that need to be translated _earlier_ rather than later
> would be more helpful to them, no?

Probably true, but as a new contributor I don't know whether changing msgids means more people need to review the patch, more files need to be changed, a translation team needs to be notified, the change needs to be pushed a different branch... whatever your process is. Localized strings are generally more of a headache for drive-by contributors, in my experience across different projects.

>> Is this blocking feedback? This strikes me as speculative
>> over-engineering
>
> No, it is loosening a pattern that is overly tight and as a side
> effect shortening the line to more readable length ;-).

Blocking or not?

R
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

> At 2024-08-02 10:10-0700, Junio C Hamano <gitster@pobox.com> sent:
>
>> Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:
>>
>>> Hmm. I'd be inclined to take the preliminary clean-up approach, but
>>> some of the existing strings (there are also two "Unsupported
>>> ..."/"Supported ..." strings near the "Could not set..."s) are going
>>> through gettext, and I'm reluctant to interfere with the l10n process.
>>
>> I do not see what you mean by interfering with the localization.
>>
>> If we are updating text to be translated anyway, giving translators
>> the strings that need to be translated _earlier_ rather than later
>> would be more helpful to them, no?
>
> Probably true, but as a new contributor I don't know whether changing
> msgids means more people need to review the patch, more files need to
> be changed, a translation team needs to be notified, the change needs
> to be pushed a different branch... whatever your process is. Localized
> strings are generally more of a headache for drive-by contributors, in
> my experience across different projects.
>
>>> Is this blocking feedback? This strikes me as speculative
>>> over-engineering
>>
>> No, it is loosening a pattern that is overly tight and as a side
>> effect shortening the line to more readable length ;-).
>
> Blocking or not?

If we are updating anyway, that question is irrelevant, no?  This
version may hit 'seen' but until the next version comes it will not
advance to 'next'.
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-08-02 12:28-0700, Junio C Hamano <gitster@pobox.com> sent:

>>>> Is this blocking feedback? This strikes me as speculative
>>>> over-engineering
>>>
>>> No, it is loosening a pattern that is overly tight and as a side
>>> effect shortening the line to more readable length ;-).
>>
>> Blocking or not?
>
> If we are updating anyway, that question is irrelevant, no?  This
> version may hit 'seen' but until the next version comes it will not
> advance to 'next'.

I can't figure out what you mean by this so I am going to proceed as if you had simply said οΏ½non-blockingοΏ½.

R
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

> At 2024-08-02 12:28-0700, Junio C Hamano <gitster@pobox.com> sent:
>
>>>>> Is this blocking feedback? This strikes me as speculative
>>>>> over-engineering
>>>>
>>>> No, it is loosening a pattern that is overly tight and as a side
>>>> effect shortening the line to more readable length ;-).
>>>
>>> Blocking or not?
>>
>> If we are updating anyway, that question is irrelevant, no?  This
>> version may hit 'seen' but until the next version comes it will not
>> advance to 'next'.
>
> I can't figure out what you mean by this so I am going to proceed as
> if you had simply said ‘non-blocking’.

It does not make much sense to ask if a suggestion is "blocking" or
"non-blocking".  If you respond with a reasonable explanation why
you do not want to take a suggestion, I may (or may not) say that
your reasoning makes sense.  IOW, making me say "it is blocking"
means you want to me to say that I won't listen to you no matter
what you say.  That is rarely be the case.

In this case, I do not think it makes sense to insist with -Fx that
the error message has the exact message.  And I do not think your
"strikes me as" qualifies as a "reasonable explanation".
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-08-02 14:13-0700, Junio C Hamano <gitster@pobox.com> sent:

>>>>>> Is this blocking feedback? This strikes me as speculative
>>>>>> over-engineering
>>>>>
>>>>> No, it is loosening a pattern that is overly tight and as a side
>>>>> effect shortening the line to more readable length ;-).
>>>>
>>>> Blocking or not?
>>>
>>> If we are updating anyway, that question is irrelevant, no?  This
>>> version may hit 'seen' but until the next version comes it will not
>>> advance to 'next'.
>>
>> I can't figure out what you mean by this so I am going to proceed as
>> if you had simply said ‘non-blocking’.
>
> It does not make much sense to ask if a suggestion is "blocking" or
> "non-blocking".  If you respond with a reasonable explanation why
> you do not want to take a suggestion, I may (or may not) say that
> your reasoning makes sense.  IOW, making me say "it is blocking"
> means you want to me to say that I won't listen to you no matter
> what you say.  That is rarely be the case.

I want you to do what Documentation/ReviewingGuidelines.txt says reviewers should do:

- When providing a recommendation, be as clear as possible about whether you
  consider it "blocking" (the code would be broken or otherwise made worse if an
  issue isn't fixed) or "non-blocking" (the patch could be made better by taking
  the recommendation, but acceptance of the series does not require it).
  Non-blocking recommendations can be particularly ambiguous when they are
  related to - but outside the scope of - a series ("nice-to-have"s), or when
  they represent only stylistic differences between the author and reviewer.

Because I would rather not have what is likely to be a highly subjective argument about this particular choice in a test script if we don't have to have one.

I would also rather not put time into figuring out how best to rename the function "old_curl_version" if it no longer checks for the particular error produced when the curl version is too old, nor would I want to think ahead about whether it is correct for these tests that use this function to continue to pass if different variations on this error are raised. There is one qualifying error currently, and that's what the test matches against. Matching against hypothetical future errors is speculative overengineering if it is not obvious how the errors will vary and what it may mean if they do.

R
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

> I would also rather not put time into figuring out how best to rename
> the function "old_curl_version" if it no longer checks for the
> particular error produced when the curl version is too old,

Now, that is a good argument to make sure the "libcurl 7.84 or later"
I suggested to omit for the sake of brevity is in the pattern.

Thanks.
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:

> what the test matches against. Matching against hypothetical future
> errors is speculative overengineering if it is not obvious how the
> errors will vary and what it may mean if they do.

The easiest you can imagine is that it turns out the cut-off version
of cURL for the feature turned out to be not 7.84, or versions of
cURL shipped by some distros have backports of the feature to an
older version that explicitly naming 7.84 causes more confusion than
naming the exact feature we rely on, and we end up rephrasing the
error message.  We have done such changes in the past, so it is not
really speculating "hypothetical future errors", but applying common
sense gained over years working on this project.
gitgitgadget[bot] commented 1 month ago

On the Git mailing list, Ryan Hendrickson wrote (reply to this):

At 2024-08-02 14:47-0700, Junio C Hamano <gitster@pobox.com> sent:

> Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes:
>
>> what the test matches against. Matching against hypothetical future
>> errors is speculative overengineering if it is not obvious how the
>> errors will vary and what it may mean if they do.
>
> The easiest you can imagine is that it turns out the cut-off version
> of cURL for the feature turned out to be not 7.84, or versions of
> cURL shipped by some distros have backports of the feature to an
> older version that explicitly naming 7.84 causes more confusion than
> naming the exact feature we rely on, and we end up rephrasing the
> error message.  We have done such changes in the past, so it is not
> really speculating "hypothetical future errors", but applying common
> sense gained over years working on this project.

Okay, so is it a problem to also change the message in the test if that happens? My concern is that if I pick some fragment of the message to elide in the test, the message could still get reworded in a way that requires the test to be changed anyway. Even if not, the comment above it would likely need revision too; and if the test doesn't fail, whoever amends the message is unlikely to notice this.

The way the test is written in the current version of the patch, there is no ambiguity about what is being tested and what would need to change if the message changes. Future contributors can grep the code for references to that error message and find this test regardless of which part and how much of the message they choose to grep for. Shaving a dozen or so characters out of line is not more important than those considerations, is it?

R
gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/d2bad54524fcca2f3e226781f07e475dfc1a53bb.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/2a3761ccb3ea06d52b6977c1a205902184ce1c3a.

gitgitgadget[bot] commented 1 month ago

This branch is now known as rh/http-proxy-path.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into seen via https://github.com/git/git/commit/9b3fcf1cf0b84011d9dfb3a517cc550ce98b131b.

gitgitgadget[bot] commented 1 month ago

This patch series was integrated into next via https://github.com/git/git/commit/d6f1fb194a02d2e343d95469c0425f8e5761f711.

gitgitgadget[bot] commented 1 month ago

There was a status update in the "Cooking" section about the branch rh/http-proxy-path on the Git mailing list:

The value of http.proxy can have "path" at the end for a socks
proxy that listens to a unix-domain socket, but we started to
discard it when we taught proxy auth code path to use the
credential helpers, which has been corrected.

Will merge to 'master'.
source: <pull.1767.v5.git.1722576007398.gitgitgadget@gmail.com>