GitoxideLabs / gitoxide

An idiomatic, lean, fast & safe pure Rust implementation of Git
Apache License 2.0
8.94k stars 305 forks source link

*Sometimes* `gix fetch` gets stuck in negotiation with `ssh://` remotes (hosted by `gitea`) #1061

Closed jalil-salame closed 1 year ago

jalil-salame commented 1 year ago

Current behavior 😯

This happens on some repos sporadically (using v0.29.0, but it has happened long before that).

When you run gix fetch it is stuck in the negotiation phase forever(?) I tend to stop it after a few seconds, but I seem to remember it staying there for a few minutes.

Cancelling it with CTRL+C and rerunning the command causes the same behaviour.

Running a git fetch fixes the repo(?) and now gix fetch works again.

This is the error displayed after sending CTRL+C:

Error: An IO error occurred when talking to the server

Caused by:
    Broken pipe (os error 32)

Expected behavior πŸ€”

gix fetch should work or time out the negotiation after a resonable amount of time (a few seconds to a minute).

Steps to reproduce πŸ•Ή

?????

I see it happenning on my selfhosted gitea repos relatively often (~once every two or so weeks) but I have no idea how to reproduce this.

If you have any idea how I could go about diagnosing the issue I'll make sure to keep it in mind for the next time it happens. For now this is all I have.

Byron commented 1 year ago

Thanks for reporting!

From the error when hitting Ctrl+C I conclude that it's an ssh URL you are using?

It's too bad to hear that it's still not stable, but given the complexity of the protocol in conjunction with many kinds of transports, it's not entirely surprising either. It's strange that it's not always hanging though, which makes it appear like a dead-lock based on buffer sizes of some pipeline used to communicate. If SSH was trying to provide a response but gix is still trying to send data, it would deadlock. In theory, that shouldn't happen of course but apparently it does in practice. Maybe a flush is missing somewhere… and a write to ssh doesn't actually get there.

The reason it works after a git fetch is probably because then no negotiation is necessary anymore, or it's greatly simplified.

I think the best course of action would be to provide a public repository in its broken (local) state that will consistently let gix hang. From there it can be debugged so it becomes clear where it's hanging.

From my experience, HTTPS works fine, but it's a state-less mode of operation which is different from the stateful SSH connection, which I basically never use myself (note that stateful connections are tested, just not through SSH but through bare TCP).

To repeat, ideally for reproduction you can provide a local copy of a stuck public repository which I can use for reproduction and debugging.

jalil-salame commented 1 year ago

Yes, I basically only use SSH for git.

I'll see if I can reproduce it in a small repo then upload the full (.git included?) repo as a test case?

It might be really hard to reproduce as most cases I remember this happening are from private repos...

I'm sorry, this is such a flaky thing to reproduce. Thanks for all the help!

Byron commented 1 year ago

I'll see if I can reproduce it in a small repo then upload the full (.git included?) repo as a test case?

Yes, that would be optimal.

I'm sorry, this is such a flaky thing to reproduce. Thanks for all the help!

Thanks for helping me to make gix better!

There is also a --trace option, but right now it only prints at the end of an invocation which doesn't happen during hangs. Having an altenrative trace-mode that is instant might alleviate this, even though I don't think it would reveal that much.

There is a light at the end of the tunnel though, as it's definitely planned to offer a built-in native ssh client as transport as well instead of forwarding to the ssh binary. Once that is in place, and if this fixes the issue, it's clear that the cause of this issue is something about how gix communicates with the ssh process via pipelines, which is really easy to get wrong without noticing as many tests never reach certain thresholds that may cause these bugs to appear.

jalil-salame commented 1 year ago

Using gix --trace fetch (private repo, can't upload .git repository) (using gix 0.30.0):

$ gix --trace fetch
^C 19:21:55 tracing INFO     run [ 7.06s | 23.06% / 100.00% ]                                                                                                                                    racing
 19:21:55 tracing INFO     ┝━ ThreadSafeRepository::discover() [ 10.2ms | 0.01% / 0.14% ]
 19:21:55 tracing INFO     β”‚  ┕━ open_from_paths() [ 9.61ms | 0.03% / 0.14% ]
 19:21:55 tracing INFO     β”‚     ┝━ gix_path::git::install_config_path() [ 7.43ms | 0.11% ]
 19:21:55 tracing INFO     β”‚     ┕━ gix_odb::Store::at() [ 245Β΅s | 0.00% ]
 19:21:55 tracing DEBUG    ┝━ πŸ› [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" LANG="C" LC_ALL="C" "ssh" "-o" "SendEnv=GIT_PROTOCOL" "gitea@**censored**" "git-upload-pack" "\'jalil/**censored**.git\'"
 19:21:55 tracing INFO     ┕━ fetch::Prepare::receive() [ 5.42s | 0.00% / 76.80% ]
 19:21:55 tracing INFO        ┕━ negotiate [ 5.42s | 0.01% / 76.79% ]
 19:21:55 tracing DEBUG          ┝━ mark_complete_and_common_ref [ 1.52ms | 0.01% / 0.02% ] mappings: 1
 19:21:55 tracing INFO           β”‚  ┝━ mark_all_refs [ 880Β΅s | 0.01% ]
 19:21:55 tracing DEBUG          β”‚  ┝━ mark_alternate_refs [ 1.18Β΅s | 0.00% ] num_odb: 0
 19:21:55 tracing INFO           β”‚  ┝━ mark known_common [ 2.48Β΅s | 0.00% ]
 19:21:55 tracing DEBUG          β”‚  ┕━ mark tips [ 2.77Β΅s | 0.00% ] num_tips: 1
 19:21:55 tracing DEBUG          ┝━ negotiate round [ 5.42s | 76.77% ] round: 1
 19:21:55 tracing DEBUG          ┕━ negotiate round [ 90.7Β΅s | 0.00% ] round: 2
Error: An IO error occurred when talking to the server

Caused by:
    Broken pipe (os error 32)

Killing it after 7s or 1min seems to make no difference to the trace output.

I will make a back up of this repo in case you have a fix you'd like to test.

Byron commented 1 year ago

Thanks so much, I forgot that it's possible to interrupt and then shut-down the application normally, showing the trace.

We see that it hangs in round two, which probably means it blocks while sending or… it blocks while receiving a reply maybe because the sending didn't get flushed so that would be a local problem. Since I pretty much trust negotiation by now I'd think it might be something silly like a flush that wasn't performed.

Byron commented 1 year ago

I think I have got something!

The gist is that I found a spot that would only work correctly in a stateless setting, i.e. with HTTP, as this would flush automatically I presume. In stateful connections, this wouldn't happen which leaves data-to-be-sent in the pipe without ever being seen by the remote, which then waits forever. Then we turn around and try to read the response to a message never sent, and wait forever as well.

The fix should alleviate that, so I recommend trying a custom-built binary from when https://github.com/Byron/gitoxide/pull/1067 was merged and see if this fixes the issue.

If you have any questions, please let me know - I'd love for this to be the fix.

jalil-salame commented 1 year ago

Seems like it still happens (the bad repo still gets stuck in negotiation, trace looks the same).

But I am not sure I got the version from main. I'm on NixOS so cargo install doesn't work (problems with cmake), so I had to do a hack to get it installed. It should be the latest commit, but first time I do this.

GDB shows this is the stack frame it is stuck on:

#0  0x00007ffff7d1574c in read () from /nix/store/ld03l52xq2ssn4x0g5asypsxqls40497-glibc-2.37-8/lib/libc.so.6
#1  0x00005555561894e3 in <std::process::ChildStdout as std::io::Read>::read::h7716f9d9510e017e ()
#2  0x0000555555ebab45 in gix_transport::client::blocking_io::file::supervise_stderr::<impl std::io::Read for gix_transport::client::blocking_io::file::ReadStdoutFailOnError>::read::h6bb614d77c16cfcc ()
#3  0x0000555555ec3a3b in std::io::default_read_exact::h765482b456176831 ()
#4  0x0000555555eb637a in gix_packetline::read::blocking_io::<impl gix_packetline::StreamingPeekableIter<T>>::read_line_inner_exhaustive::hda32d152ad4b03a9 ()
#5  0x0000555555eb69db in gix_packetline::read::blocking_io::<impl gix_packetline::StreamingPeekableIter<T>>::peek_line::h751a7e10b8e7fb51 ()
#6  0x0000555555eccec5 in <gix_packetline::read::sidebands::blocking_io::WithSidebands<T,alloc::boxed::Box<dyn core::ops::function::FnMut<(bool,&[u8])>+Output = gix_packetline::read::ProgressAction>> as gix_transport::client::blocking_io::bufread_ext::ExtendedBufRead>::peek_data_line::h2d180765cb59cd52 ()
#7  0x0000555555752811 in gix_protocol::fetch::response::blocking_io::<impl gix_protocol::fetch::response::Response>::from_line_reader::hf0e62b51c6345a69 ()
#8  0x0000555555865293 in gix::remote::connection::fetch::receive_pack::<impl gix::remote::connection::fetch::Prepare<T>>::receive::h3ba96dacc5aacb3e ()
#9  0x00005555558c1c7d in gitoxide_core::repository::fetch::function::fetch::he4f05a27c760b3c9 ()
#10 0x00005555557079a9 in gix_trace::<impl gix_trace::enabled::Span>::into_scope::h454454c93e6ddecd ()
#11 0x00005555558ed14a in gitoxide::shared::pretty::prepare_and_run::h86c86f422ce1ff38 ()
#12 0x0000555555715122 in gix::plumbing::main::main::h35af222d41ff9360 ()
#13 0x000055555595dda3 in std::sys_common::backtrace::__rust_begin_short_backtrace::h9b4f271923e9f1ae ()
#14 0x000055555575c7cd in std::rt::lang_start::{{closure}}::h07ae7a6dd00ee5f6 ()
#15 0x0000555556187bb7 in std::rt::lang_start_internal::hbcbc84fca12d560d ()
#16 0x00005555557aa9a5 in main ()

I don't know how to enable debug symbols in NixOS. I'll look into that later.

Byron commented 1 year ago

Thanks for having taken a look! Could you verify that you see protocol_version in the trace? If not, it's not the latest version, it should be right with negotiate.

It's good to see that it tries to read from the server, which probably equally tries to read from the client because it didn't flush its output pipe yet.

PS: You should be able to install with cargo install --git https://github.com/Byron/gitoxide --bin gix --features max-pure --locked - it doesn't need cmake.

jalil-salame commented 1 year ago

The right command is cargo install --git https://github.com/Byron/gitoxide --bin gix --features max-pure --locked gitoxide, but it keeps trying to compile lib-ng-sys and failing because of NixOS's cmake.

I'll whip up a quick flake to fix this and get debug symbols.

Byron commented 1 year ago

Ah, right! Then the real right command is :): cargo install --git https://github.com/Byron/gitoxide --bin gix --features max-pure --locked --no-default-features gitoxide.

That will definitely not need additional toolchains and is pure Rust.

jalil-salame commented 1 year ago

That works, but still gets stuck in negotiation. This is the gdb stack trace (now with debug info!):

#0  0x00007ffff7dd174c in read () from /nix/store/ld03l52xq2ssn4x0g5asypsxqls40497-glibc-2.37-8/lib/libc.so.6
#1  0x00005555575cb0f3 in std::sys::unix::fd::FileDesc::read () at library/std/src/sys/unix/fd.rs:90
#2  std::sys::unix::pipe::AnonPipe::read () at library/std/src/sys/unix/pipe.rs:49
#3  std::process::{impl#16}::read () at library/std/src/process.rs:428
#4  0x00005555565988c2 in gix_transport::client::blocking_io::file::supervise_stderr::{impl#1}::read (self=0x555557ee53c0, buf=...) at gix-transport/src/client/blocking_io/file.rs:163
#5  0x00005555565a5563 in std::io::default_read_exact<gix_transport::client::blocking_io::file::ReadStdoutFailOnError> (this=0x555557ee53c0, buf=...)
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/mod.rs:467
#6  0x00005555565c5578 in std::io::Read::read_exact<gix_transport::client::blocking_io::file::ReadStdoutFailOnError> (self=0x555557ee53c0, buf=...)
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/mod.rs:825
#7  0x0000555556595f8f in std::io::impls::{impl#4}::read_exact<(dyn std::io::Read + core::marker::Send)> (self=0x555557f23b40, buf=...)
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/impls.rs:155
#8  0x00005555565b1462 in gix_packetline::StreamingPeekableIter<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>>::read_line_inner<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>> (reader=0x555557f23b40, buf=...) at gix-packetline/src/read/blocking_io.rs:18
#9  0x00005555565b18b1 in gix_packetline::StreamingPeekableIter<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>>::read_line_inner_exhaustive<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>> (reader=0x555557f23b40, buf=0x555557f23b50, delimiters=..., fail_on_err_lines=true, buf_resize=true)
    at gix-packetline/src/read/blocking_io.rs:45
#10 0x00005555565b2075 in gix_packetline::StreamingPeekableIter<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>>::peek_line<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>> (self=0x555557f23b28) at gix-packetline/src/read/blocking_io.rs:124
#11 0x00005555565b8f57 in gix_packetline::read::sidebands::blocking_io::WithSidebands<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>, alloc::boxed::Box<dyn core::ops::function::FnMut<(bool, &[u8]), Output=gix_packetline::read::ProgressAction>, alloc::alloc::Global>>::peek_data_line<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>, alloc::boxed::Box<dyn core::ops::function::FnMut<(bool, &[u8]), Output=gix_packetline::read::ProgressAction>, alloc::alloc::Global>> (self=0x555557ee6150)
    at gix-packetline/src/read/sidebands/blocking_io.rs:92
#12 0x00005555565ba633 in gix_transport::client::blocking_io::bufread_ext::{impl#4}::peek_data_line<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>> (
    self=0x555557ee6150) at gix-transport/src/client/blocking_io/bufread_ext.rs:107
#13 0x00005555562ad454 in gix_transport::client::blocking_io::bufread_ext::{impl#1}::peek_data_line<(dyn gix_transport::client::blocking_io::bufread_ext::ExtendedBufRead + core::marker::Unpin)> (self=0x7fffffff2130) at gix-transport/src/client/blocking_io/bufread_ext.rs:70
#14 0x000055555622a61e in gix_protocol::fetch::response::Response::from_line_reader<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::bufread_ext::ExtendedBufRead + core::marker::Unpin), alloc::alloc::Global>> (version=gix_transport::Protocol::V1, reader=0x7fffffff2130, client_expects_pack=false) at gix-protocol/src/fetch/response/blocking_io.rs:53
#15 0x00005555561b5145 in gix::remote::connection::fetch::Prepare<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::traits::Transport + core::marker::Send), alloc::alloc::Global>>::receive_inner<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::traits::Transport + core::marker::Send), alloc::alloc::Global>> (self=..., progress=...,
    should_interrupt=0x555557edef98 <gix::interrupt::IS_INTERRUPTED::h91f14977242c8b37>) at gix/src/remote/connection/fetch/receive_pack.rs:221
#16 0x00005555558277ad in gix::remote::connection::fetch::Prepare<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::traits::Transport + core::marker::Send), alloc::alloc::Global>>::receive<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::traits::Transport + core::marker::Send), alloc::alloc::Global>, &mut prodash::progress::utils::DoOrDiscard<prodash::tree::Item>> (self=..., progress=0x7fffffff7db0, should_interrupt=0x555557edef98 <gix::interrupt::IS_INTERRUPTED::h91f14977242c8b37>) at gix/src/remote/connection/fetch/receive_pack.rs:81
#17 0x00005555559be203 in gitoxide_core::repository::fetch::function::fetch<prodash::progress::utils::DoOrDiscard<prodash::tree::Item>, &mut dyn std::io::Write, &mut dyn std::io::Write> (
    repo=..., progress=..., out=..., err=...) at gitoxide-core/src/repository/fetch.rs:64
#18 0x0000555555a620d7 in gix::plumbing::main::main::{closure#11} (progress=..., out=..., err=...) at src/plumbing/main.rs:357
#19 0x0000555555bd359e in gitoxide::shared::pretty::prepare_and_run::{closure#0}<(), core::ops::range::RangeInclusive<u8>, gix::plumbing::main::main::{closure_env#11}> ()
    at src/shared.rs:170
#20 0x000055555590e739 in gix_trace::enabled::Span::into_scope<core::result::Result<(), anyhow::Error>, gitoxide::shared::pretty::prepare_and_run::{closure_env#0}<(), core::ops::range::RangeInclusive<u8>, gix::plumbing::main::main::{closure_env#11}>> (self=<error reading variable: Cannot access memory at address 0xfffffffffffffe00>,
    f=<error reading variable: Cannot access memory at address 0x75>) at gix-trace/src/lib.rs:43
#21 0x0000555555bbd3c2 in gitoxide::shared::pretty::prepare_and_run<(), core::ops::range::RangeInclusive<u8>, gix::plumbing::main::main::{closure_env#11}> (name=..., trace=false,
    verbose=true, progress=false, progress_keep_open=false, range=..., run=...) at src/shared.rs:169
#22 0x0000555555c7b2b8 in gix::plumbing::main::main () at src/plumbing/main.rs:349
#23 0x0000555555cc82a6 in gix::main () at src/gix.rs:9
Byron commented 1 year ago

I thinks that's it! gix_transport::client::blocking_io::file::supervise_stderr:: tells me that it's not the negotiation that hangs, but the 'supervisor' thread that blocks forever. Maybe that's by design though and it's just the stack-trace that doesn't make the different threads clear. Maybe the main thread is also hanging while reading.

jalil-salame commented 1 year ago

I have no idea how to get gdb to debug threads, I'll see if I can figure it out later today.

Byron commented 1 year ago

I don't think this will be necessary, as I had a look at the code and it's made so that it most definitely can't deadlock.

Can you validate that this is the latest version by checking that the negotiate trace line shows the protocol version? If so, what is the protocol version?

In any case, you could try to set the protocol version to something else, as with gix -c protocol.version=1|2 fetch and see if that makes a difference.

Thanks for all your help.

jalil-salame commented 1 year ago

Something is really weird with the protocol version; the trace says transport is trying to set version 2, but negotiate uses version 1:

$ gix --trace fetch
^C 09:32:15 tracing INFO     run [ 7.26s | 20.98% / 100.00% ]                                                                                                                                    racing
 09:32:15 tracing INFO     ┝━ ThreadSafeRepository::discover() [ 16.8ms | 0.01% / 0.23% ]
 09:32:15 tracing INFO     β”‚  ┕━ open_from_paths() [ 15.8ms | 0.12% / 0.22% ]
 09:32:15 tracing INFO     β”‚     ┝━ gix_path::git::install_config_path() [ 6.40ms | 0.09% ]
 09:32:15 tracing INFO     β”‚     ┕━ gix_odb::Store::at() [ 510Β΅s | 0.01% ]
 09:32:15 tracing DEBUG    ┝━ πŸ› [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" LANG="C" LC_ALL="C" "ssh" "-o" "SendEnv=GIT_PROTOCOL" "gitea@**censored**" "git-upload-pack" "\'**censored**.git\'"
 09:32:15 tracing INFO     ┕━ fetch::Prepare::receive() [ 5.72s | 0.01% / 78.79% ]
 09:32:15 tracing DEBUG       ┕━ negotiate [ 5.72s | 0.02% / 78.78% ] protocol_version: 1
 09:32:15 tracing DEBUG          ┝━ mark_complete_and_common_ref [ 5.81ms | 0.05% / 0.08% ] mappings: 1
 09:32:15 tracing INFO           β”‚  ┝━ mark_all_refs [ 2.01ms | 0.03% ]
 09:32:15 tracing DEBUG          β”‚  ┝━ mark_alternate_refs [ 11.9Β΅s | 0.00% ] num_odb: 0
 09:32:15 tracing INFO           β”‚  ┝━ mark known_common [ 21.5Β΅s | 0.00% ]
 09:32:15 tracing DEBUG          β”‚  ┕━ mark tips [ 21.1Β΅s | 0.00% ] num_tips: 1
 09:32:15 tracing DEBUG          ┝━ negotiate round [ 5.71s | 78.67% ] round: 1
 09:32:15 tracing DEBUG          ┕━ negotiate round [ 618Β΅s | 0.01% ] round: 2
Error: An IO error occurred when talking to the server

Caused by:
    Broken pipe (os error 32)

Same when you force protocol v2:

gix -c protocol.version=2 --trace fetch (click to show output) ```console $ gix -c protocol.version=2 --trace fetch ^C 09:33:19 tracing INFO run [ 15.3s | 8.78% / 100.00% ] racing 09:33:19 tracing INFO ┝━ ThreadSafeRepository::discover() [ 9.95ms | 0.01% / 0.06% ] 09:33:19 tracing INFO β”‚ ┕━ open_from_paths() [ 8.81ms | 0.01% / 0.06% ] 09:33:19 tracing INFO β”‚ ┝━ gix_path::git::install_config_path() [ 6.80ms | 0.04% ] 09:33:19 tracing INFO β”‚ ┕━ gix_odb::Store::at() [ 199Β΅s | 0.00% ] 09:33:19 tracing DEBUG ┝━ πŸ› [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" LANG="C" LC_ALL="C" "ssh" "-o" "SendEnv=GIT_PROTOCOL" "gitea@**censored**" "git-upload-pack" "\'**censored**.git\'" 09:33:19 tracing INFO ┕━ fetch::Prepare::receive() [ 14.0s | 0.01% / 91.15% ] 09:33:19 tracing DEBUG ┕━ negotiate [ 14.0s | 0.01% / 91.15% ] protocol_version: 1 09:33:19 tracing DEBUG ┝━ mark_complete_and_common_ref [ 2.48ms | 0.01% / 0.02% ] mappings: 1 09:33:19 tracing INFO β”‚ ┝━ mark_all_refs [ 983Β΅s | 0.01% ] 09:33:19 tracing DEBUG β”‚ ┝━ mark_alternate_refs [ 4.36Β΅s | 0.00% ] num_odb: 0 09:33:19 tracing INFO β”‚ ┝━ mark known_common [ 8.74Β΅s | 0.00% ] 09:33:19 tracing DEBUG β”‚ ┕━ mark tips [ 8.10Β΅s | 0.00% ] num_tips: 1 09:33:19 tracing DEBUG ┝━ negotiate round [ 14.0s | 91.12% ] round: 1 09:33:19 tracing DEBUG ┕━ negotiate round [ 549Β΅s | 0.00% ] round: 2 Error: An IO error occurred when talking to the server Caused by: Broken pipe (os error 32) ```

Forcing vesion 1 looks normal (still stuck though):

gix -c protocol.version=1 --trace fetch (click to show output) ```console $ gix -c protocol.version=1 --trace fetch ^C 09:36:28 tracing INFO run [ 5.82s | 23.78% / 100.00% ] racing 09:36:28 tracing INFO ┝━ ThreadSafeRepository::discover() [ 3.38ms | 0.00% / 0.06% ] 09:36:28 tracing INFO β”‚ ┕━ open_from_paths() [ 3.18ms | 0.02% / 0.05% ] 09:36:28 tracing INFO β”‚ ┝━ gix_path::git::install_config_path() [ 2.03ms | 0.03% ] 09:36:28 tracing INFO β”‚ ┕━ gix_odb::Store::at() [ 155Β΅s | 0.00% ] 09:36:28 tracing DEBUG ┝━ πŸ› [debug]: gix_transport::SpawnProcessOnDemand | command: LANG="C" LC_ALL="C" "ssh" "gitea@**censored**" "git-upload-pack" "\'**censored**.git\'" 09:36:28 tracing INFO ┕━ fetch::Prepare::receive() [ 4.44s | 0.01% / 76.17% ] 09:36:28 tracing DEBUG ┕━ negotiate [ 4.44s | 0.02% / 76.15% ] protocol_version: 1 09:36:28 tracing DEBUG ┝━ mark_complete_and_common_ref [ 1.97ms | 0.02% / 0.03% ] mappings: 1 09:36:28 tracing INFO β”‚ ┝━ mark_all_refs [ 686Β΅s | 0.01% ] 09:36:28 tracing DEBUG β”‚ ┝━ mark_alternate_refs [ 4.81Β΅s | 0.00% ] num_odb: 0 09:36:28 tracing INFO β”‚ ┝━ mark known_common [ 8.97Β΅s | 0.00% ] 09:36:28 tracing DEBUG β”‚ ┕━ mark tips [ 8.36Β΅s | 0.00% ] num_tips: 1 09:36:28 tracing DEBUG ┝━ negotiate round [ 4.43s | 76.09% ] round: 1 09:36:28 tracing DEBUG ┕━ negotiate round [ 727Β΅s | 0.01% ] round: 2 Error: An IO error occurred when talking to the server Caused by: Broken pipe (os error 32) ```
jalil-salame commented 1 year ago

A bit more context on when it happens (I'm not 100% sure this is the pattern because it happens so infrequently).

I have two computers where I have the same git repos (the ones that get stuck from my selfhosted gitea instance). I have a preference for one PC so I leave the other alone for a while. When I return to the other computer the repos sometimes get stuck.

So how I think the issue could be reproduced is:

  1. Get a gitea repo with an ssh origin.
  2. Clone it on two different locations (maybe two computers?).
  3. Make a few commits on one location (maybe over a certain period of time).
  4. Try to fetch the updated repo on the out of date location.

I'll see if I can reproduce this like that.

Byron commented 1 year ago

The protocol-discrepancy is due to the server being able to downgrade the client. Setting the protocol version is merely a request, and it can be ignored. V2 is also the default and gitea seems to support V1 only.

The issue also seems to happen if multiple negotiation rounds are needed which is strange.

Which reminds me, please do try different settings for the fetch.negotiationAlgorithm, i.e. gix -c fetch.negotiationAlgorithm=consecutive|skipping|noop. consecutive is the default. Maybe that makes a difference, but I'd expect that in one of them it won't get to the second round which then makes it work.

I start to think it's something about the protocol that is wrong or unexpected that causes the probably custom gitea server side to not send a response.

I think the ultimate answer will be if you could use a way to make the packetlines sent over the wire visible. git can do that with GIT_TRACE_PACKET, but gix cannot yet. Probably the way to go here is to implement a debug-feature that will show everything that it sends or receives on stderr. As it's plain text, the content would be redactable while still revealing enough of what's happening to learn what the difference is between gix and git.

Independently of that, if you could provide a public repo in a state that reproduces it on my side, that would definitely help as well.

jalil-salame commented 1 year ago

Both skipping and noop work (version 0.30.0 even).

Also, when setting an option with -c it would be nice to error if the config is wrong (I set skip instead of skipping and didn't notice). Should I post an issue about that?

jalil-salame commented 1 year ago

Noop

$ gix --trace -c fetch.negotiationAlgorithm=noop fetch
10:26:20 indexing done 1.2k objects in 0.09s (12.7k objects/s)
 10:26:20 decompressing done 203.5KB in 0.09s (2.2MB/s)
 10:26:20     Resolving done 1.2k objects in 0.05s (23.4k objects/s)
 10:26:20      Decoding done 333.0KB in 0.05s (6.6MB/s)
 10:26:20 writing index file done 34.2KB in 0.00s (63.8MB/s)
 10:26:20  create index file done 1.2k objects in 0.15s (8.1k objects/s)
 10:26:20          read pack done 179.2KB in 0.18s (1003.3KB/s)
 10:26:20            tracing INFO     run [ 1.64s | 85.64% / 100.00% ]
 10:26:20            tracing INFO     ┝━ ThreadSafeRepository::discover() [ 3.12ms | 0.01% / 0.19% ]
 10:26:20            tracing INFO     β”‚  ┕━ open_from_paths() [ 3.00ms | 0.04% / 0.18% ]
 10:26:20            tracing INFO     β”‚     ┝━ gix_path::git::install_config_path() [ 2.25ms | 0.14% ]
 10:26:20            tracing INFO     β”‚     ┕━ gix_odb::Store::at() [ 68.6Β΅s | 0.00% ]
 10:26:20            tracing DEBUG    ┝━ πŸ› [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" LANG="C" LC_ALL="C" "ssh" "-o" "SendEnv=GIT_PROTOCOL" "gitea@**censored**" "git-upload-pack" "\'**censored**.git\'"
 10:26:20            tracing INFO     ┕━ fetch::Prepare::receive() [ 232ms | 0.02% / 14.17% ]
 10:26:20            tracing INFO        ┝━ negotiate [ 50.0ms | 0.03% / 3.05% ]
 10:26:20            tracing DEBUG       β”‚  ┝━ mark_complete_and_common_ref [ 2.00ms | 0.05% / 0.12% ] mappings: 1
 10:26:20            tracing INFO        β”‚  β”‚  ┝━ mark_all_refs [ 1.25ms | 0.08% ]
 10:26:20            tracing DEBUG       β”‚  β”‚  ┝━ mark_alternate_refs [ 1.81Β΅s | 0.00% ] num_odb: 0
 10:26:20            tracing INFO        β”‚  β”‚  ┝━ mark known_common [ 2.53Β΅s | 0.00% ]
 10:26:20            tracing DEBUG       β”‚  β”‚  ┕━ mark tips [ 1.59Β΅s | 0.00% ] num_tips: 1
 10:26:20            tracing DEBUG       β”‚  ┕━ negotiate round [ 47.6ms | 2.90% ] round: 1
 10:26:20            tracing INFO        ┝━ gix_pack::Bundle::write_to_directory() [ 179ms | 10.94% ]
 10:26:20            tracing DEBUG       ┕━ update_refs() [ 2.60ms | 0.05% / 0.16% ] mappings: 1
 10:26:20            tracing DEBUG          ┕━ apply [ 1.71ms | 0.10% ] edits: 1
+refs/heads/*:refs/remotes/origin/*
        0393a83903a10ac8c61112af371140bbc4e0dd75 refs/heads/main -> refs/remotes/origin/main [fast-forward]
pack  file: "./.git/objects/pack/pack-cc62afd989d58966bbd3b54ff6ba124b1b89f2a9.pack"
index file: "./.git/objects/pack/pack-cc62afd989d58966bbd3b54ff6ba124b1b89f2a9.idx"
server sent 2 tips, 1 were filtered due to 1 refspec(s).

Skipping

$ gix --trace -c fetch.negotiationAlgorithm=skipping fetch
 10:28:14 indexing done 6.0 objects in 0.00s (15.6k objects/s)
 10:28:14 decompressing done 1.8KB in 0.00s (4.6MB/s)
 10:28:14     Resolving done 6.0 objects in 0.05s (117.0 objects/s)
 10:28:14      Decoding done 1.8KB in 0.05s (36.0KB/s)
 10:28:14 writing index file done 1.2KB in 0.00s (25.2MB/s)
 10:28:14  create index file done 6.0 objects in 0.05s (116.0 objects/s)
 10:28:14          read pack done 1.6KB in 0.05s (31.1KB/s)
 10:28:14            tracing INFO     run [ 1.10s | 88.69% / 100.00% ]
 10:28:14            tracing INFO     ┝━ ThreadSafeRepository::discover() [ 4.41ms | 0.03% / 0.40% ]
 10:28:14            tracing INFO     β”‚  ┕━ open_from_paths() [ 4.09ms | 0.10% / 0.37% ]
 10:28:14            tracing INFO     β”‚     ┝━ gix_path::git::install_config_path() [ 2.91ms | 0.26% ]
 10:28:14            tracing INFO     β”‚     ┕━ gix_odb::Store::at() [ 104Β΅s | 0.01% ]
 10:28:14            tracing DEBUG    ┝━ πŸ› [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" LANG="C" LC_ALL="C" "ssh" "-o" "SendEnv=GIT_PROTOCOL" "gitea@**censored**" "git-upload-pack" "\'**censored**.git\'"
 10:28:14            tracing INFO     ┕━ fetch::Prepare::receive() [ 120ms | 0.02% / 10.91% ]
 10:28:14            tracing INFO        ┝━ negotiate [ 67.2ms | 0.04% / 6.08% ]
 10:28:14            tracing DEBUG       β”‚  ┝━ mark_complete_and_common_ref [ 1.58ms | 0.06% / 0.14% ] mappings: 1
 10:28:14            tracing INFO        β”‚  β”‚  ┝━ mark_all_refs [ 881Β΅s | 0.08% ]
 10:28:14            tracing DEBUG       β”‚  β”‚  ┝━ mark_alternate_refs [ 1.09Β΅s | 0.00% ] num_odb: 0
 10:28:14            tracing INFO        β”‚  β”‚  ┝━ mark known_common [ 2.68Β΅s | 0.00% ]
 10:28:14            tracing DEBUG       β”‚  β”‚  ┕━ mark tips [ 25.1Β΅s | 0.00% ] num_tips: 1
 10:28:14            tracing DEBUG       β”‚  ┕━ negotiate round [ 65.2ms | 5.90% ] round: 1
 10:28:14            tracing INFO        ┝━ gix_pack::Bundle::write_to_directory() [ 52.4ms | 4.74% ]
 10:28:14            tracing DEBUG       ┕━ update_refs() [ 725Β΅s | 0.03% / 0.07% ] mappings: 1
 10:28:14            tracing DEBUG          ┕━ apply [ 438Β΅s | 0.04% ] edits: 1
+refs/heads/*:refs/remotes/origin/*
        0393a83903a10ac8c61112af371140bbc4e0dd75 refs/heads/main -> refs/remotes/origin/main [fast-forward]
pack  file: "./.git/objects/pack/pack-8bf88c208d478c44cfea799314371755fee90a9e.pack"
index file: "./.git/objects/pack/pack-8bf88c208d478c44cfea799314371755fee90a9e.idx"
server sent 2 tips, 1 were filtered due to 1 refspec(s).
Byron commented 1 year ago

How many negotiation rounds did you get with skipping? Just one, is my prediction. (Concurrent editing)

Also, when setting an option with -c it would be nice to error if the config is wrong (I set skip instead of skipping and didn't notice). Should I post an issue about that?

In theory, that's a feature, and it's intentionally lenient there. This makes it easy to change and maybe it should change. If you want to change it to strict mode please be my guest.

I am looking into adding tracing support similar to GIT_TRACE_PACKET now.

jalil-salame commented 1 year ago

The repo is behind by just one commit, but I cannot reproduce it in Gitea's public git repo (haven't tried for long though).

The date of the last two commits:

Date: Thu Aug 24 15:43:58 2023 +0000 (origin has this one) Date: Tue Aug 15 17:20:20 2023 +0000 (repo is stuck here)

jalil-salame commented 1 year ago

Also, when setting an option with -c it would be nice to error if the config is wrong (I set skip instead of skipping and didn't notice). Should I post an issue about that?

In theory, that's a feature, and it's intentionally lenient there. This makes it easy to change and maybe it should change. If you want to change it to strict mode please be my guest.

My preference is .gitconfig should be lenient -c should be strict. I would rather my scripts/cli commands error quickly than silently do the wrong thing. Is there a config option for that?

Byron commented 1 year ago

Is there a config option for that?

You could automatically set it to the strict version of the desired mode when CLI overrides are detected - I think that's a great idea. Should just be a couple of lines if you want to try it.

The repo is behind by just one commit, [..]

That would be very strange, it shouldn't need two negotiation rounds if just one commit is missing. It would definitely be interesting to see what happens when using HTTPS (as a state-less variant of the protocol). I guess once there is something to reproduce the issue that can also reproduce on GitHub, which could be used to compare both HTTPS and SSH.

Never mind the rambling above though, I think once packetline tracing is available, the issue will clear up quickly.

Byron commented 1 year ago

Packetline support is cooking in this PR and should be merged soon.

Then I'd need the output of GIT_TRACE_PACKET=1 git fetch and gix -c gitoxide.tracePacket=1 --trace fetch for comparison. If redaction of the output is happening, please be sure to apply the same 'function' to both outputs - it's fine to remove everything except the negotiation part - it should be distinct enough.

My expectation is that both interactions should be very similar if not the same, so there is probably some difference that explains the deadlock. My hope is that this is something obvious, like a protocol error of some sort, while the negotiation commit-walk is exactly the same.

To be sure that it's (probably) not a bug on the server side, I have checked the code of gitea and believe that they also just run git under the hood, i.e. git-upload-pack.

Another test we can try is to locally host the repo in the state that is on the server using git daemon, and tune up gits own tracing, add it as remote to the local clone that is in a state that hangs, and see what that yields - usually that will provide additional information about the state of the git-upload-pack process.

But one step at a time πŸ˜….

PS: I also implemented auto-strict so -c x=foo will not quietly ignore obvious errors anymore.

jalil-salame commented 1 year ago

gitoxide.tracePacket=1 Seems to print binary characters ('\0') which makes it a pain to deal (grep and diff need the --text flag or they refuse to print to stdout).

Here are the logs:

git.log

gix.log

Byron commented 1 year ago

gitoxide.tracePacket=1 Seems to print binary characters ('\0') which makes it a pain to deal (grep and diff need the --text flag or they refuse to print to stdout).

Yes, I thought about that but relied entirely on bstr::BStr::fmt() to deal with this - I am pretty sure that git at least escapes these strings to make the terminal-safe to some extend. This can certainly be improved.

Regarding the logs, here is my findings:

gitoxide sees the ready and uses the second negotiation round to just send done, an empty round if you will, and then tries to read the response from the remote. And that's where the client hangs while reading because the remote seems to be waiting for something else instead of sending would probably would be a final ACK.

And that's the difficulty with V1 - it's unclear to me how it's supposed to be working and thus far I simply made reproducible hangs work. git on the receiving and has somewhat complex logic to know what to do when and gitoxide doesn't implement it (also because I don't understand it), but instead relies on heuristics to know what to do based on observable factors.

Another approach would be to find out why protocol version 2 isn't accepted by the server - we know it's starting git under the hood so V2 support should be there.

From here all I could imagine is to make changes and send a patch, and you'd apply the patch to see if it works.

For completeness, here is the side-by-side I was using. Screenshot 2023-10-18 at 20 18 22
Byron commented 1 year ago

On another note, it's also clear that GIT_PROTOCOL is set and sent over ssh so it should set up the V2 protocol. Judging by the code git should pick that up unless there is something with the ssh server side that prohibits SendEnv.

Byron commented 1 year ago

Also, another question, which version of git is running on the gitea server? I'd hope it's old and can be updated.

Byron commented 1 year ago

From the code that I see, also on the git side, I don't see why it would hang. When decoding the server response it's so incredibly careful that as long as the server sends a pack, it will find it (it's made so that it consumes all the acks that the server sends but that aren't really required to get to the pack). But here it seems the server doesn't do as it's supposed to do when seeing the done of the client.

If the version check doesn't lead anywhere, i.e. it seems new enough, then we'd have to host the server-side locally with git daemon to get additional insights.

jalil-salame commented 1 year ago

Also, another question, which version of git is running on the gitea server? I'd hope it's old and can be updated.

It is fairly recent (2.40.1 might also be 2.38.4? but I think that is just a backup) and I have no idea how to update it (although I can probably figure something up).

Byron commented 1 year ago

Thanks, I'd think that's recent enough, no need to investigate further.

What I'd like to learn is why the protocol V2 request sent by the client isn't respected by the server - my guess is that SSH is locked down and won't allow setting any environment variables. Could that be? Is this something you can validate?

Of course, the avenue above aims at side-stepping the actual problem, which might not be what you want in the first place. I'd also love this hang to go away forever, but from all I can see is that it should work if the server would respond at all to the "done" that gitoxide sends before it reads.

It would be really helpful to see what git on the other end receives - even though I double-checked that gitoxide will flush all writes before reading, maybe that's not actually working so the "done" never reaches the server. This could be reproduced by hosting the server state locally with git daemon.

Going one step further down the rabbithole I saw that ChildStdin::flush() isn't actually doing anything - so the standard library seems to rightfully assume that writes are flushed immediately. Could it be that this sometimes isn't happening automatically?

jalil-salame commented 1 year ago

Thanks, I'd think that's recent enough, no need to investigate further.

What I'd like to learn is why the protocol V2 request sent by the client isn't respected by the server - my guess is that SSH is locked down and won't allow setting any environment variables. Could that be? Is this something you can validate?

This is the issue with the protocol version, just verified it:

$ export GIT_PROTOCOL=v2
$ echo $GIT_PROTOCOL
v2
$ ssh -o 'SendEnv GIT_PROTOCOL' **REDACTED**
Last login: **REDACTED*
$ echo $GIT_PROTOCOL
jalil-salame commented 1 year ago

Going one step further down the rabbithole I saw that ChildStdin::flush() isn't actually doing anything - so the standard library seems to rightfully assume that writes are flushed immediately. Could it be that this sometimes isn't happening automatically?

Some fds will flush immediately. I don't know if somewhere someone is assuming that is the case when it isn't so...

jalil-salame commented 1 year ago

Trying to reproduce this in gitea.com through forcing protocol version 1. There might be a minimum number of commits required to cause the issue as it succeeds with 3 commits (2 in test repo).

Tested up to 19 commits, no luck reproducing the issue.

Byron commented 1 year ago

Thanks, I'd think that's recent enough, no need to investigate further. What I'd like to learn is why the protocol V2 request sent by the client isn't respected by the server - my guess is that SSH is locked down and won't allow setting any environment variables. Could that be? Is this something you can validate?

This is the issue with the protocol version, just verified it:

$ export GIT_PROTOCOL=v2
$ echo $GIT_PROTOCOL
v2
$ ssh -o 'SendEnv GIT_PROTOCOL' **REDACTED**
Last login: **REDACTED*
$ echo $GIT_PROTOCOL

I tested this locally and figured that AcceptEnv GIT_PROTOCOL also needs to be added to /etc/sshd_config to be going through. It's easily verifiable locally as well. This gist gave the clue I needed.

With this configuration change, you'd be able to side-step the issue.

However, ideally there would be extended debug info of Protocol V1 to learn what the problem actually is. For that, the local clone that shows the hanging issue would get a remote like file://path/to/local/server-mirror-of-repo-that-would-be-on-gitea. As gix will start git under the hood, one can influence it with GIT_TRACE_PACKET=1 - now both gix as well as git will output their packet lines. That way it would be possible to see if the done line is actually ending up on the server side. If that works, we know that it's something about ssh and that could be validated by adding the same repository through ssh like ssh://localhost/path/to/… and putting git-upload-pack as script in the PATH which gets to set GIT_TRACE_PACKET=1 before actually launching git-upload-pack $@.

Trying to reproduce this in gitea.com through forcing protocol version 1. There might be a minimum number of commits required to cause the issue as it succeeds with 3 commits (2 in test repo).

Tested up to 19 commits, no luck reproducing the issue.

You could turn on packetline tracing to see what the difference is. I presume that in these cases, it only has one round of negotiation and sends the done right away, which then turns out to work well.

I also have tests with multi-round negotiation for both stateless and stateful connections, for V1 and V2 protocols, and it all works fine (by now, after fixing many hanging issues in the first place). This is what makes this issue here so puzzling - the receiving code is written in such a way that it basically skips over all unnecessary chatter right to the pack, and the pack is expected after sending done no matter what. Why is it not being sent? The best answer I have is that done never makes it to the server, and in the paragraph above I laid out how that could be tested locally, at least in theory.

Thanks again for your tremendous help - I'd love to squelch this bug and then hopefully V1 hangs will be a thing of the past, for good, for real this time :D .

jalil-salame commented 1 year ago

Thanks, I'd think that's recent enough, no need to investigate further. What I'd like to learn is why the protocol V2 request sent by the client isn't respected by the server - my guess is that SSH is locked down and won't allow setting any environment variables. Could that be? Is this something you can validate?

This is the issue with the protocol version, just verified it:

$ export GIT_PROTOCOL=v2
$ echo $GIT_PROTOCOL
v2
$ ssh -o 'SendEnv GIT_PROTOCOL' **REDACTED**
Last login: **REDACTED*
$ echo $GIT_PROTOCOL

I tested this locally and figured that AcceptEnv GIT_PROTOCOL also needs to be added to /etc/sshd_config to be going through. It's easily verifiable locally as well. This gist gave the clue I needed.

With this configuration change, you'd be able to side-step the issue.

I am tempted to never do this so I can keep catching bugs in v1 c:

I also have tests with multi-round negotiation for both stateless and stateful connections, for V1 and V2 protocols, and it all works fine (by now, after fixing many hanging issues in the first place). This is what makes this issue here so puzzling - the receiving code is written in such a way that it basically skips over all unnecessary chatter right to the pack, and the pack is expected after sending done no matter what. Why is it not being sent? The best answer I have is that done never makes it to the server, and in the paragraph above I laid out how that could be tested locally, at least in theory.

How do you go about requiring two negotiation rounds? I thought it would be after 16 commits, but it doesn't seem to be that, and I am not famliar with the git protocol.

I'll try to figure out how to get the repo from the gitea server to debug locally.

And thanks for your help! This whole process has been so pleasant <3

Byron commented 1 year ago

How do you go about requiring two negotiation rounds? I thought it would be after 16 commits, but it doesn't seem to be that, and I am not famliar with the git protocol.

You might find it amusing that I also don't have a clear-cut way of achieving this. The test-suite adapts a similar test from git itself, and I only went with it without thinking too much about it.

However, in theory it should be easy to get an even higher number of rounds by adding a remote that has nothing to do with the local repository. Then it should try to find common commits and just give up after sending 256 of them. Maybe… that's even the solution for me to reproduce this… .

Indeed, I could easily get it to do a lot of rounds in V1 one without SSH in the middle:

┕━ fetch::Prepare::receive() [ 1.80s | 0.01% / 98.11% ]
 09:57:08            tracing DEBUG       ┝━ negotiate [ 527ms | 0.17% / 28.75% ] protocol_version: 1
 09:57:08            tracing DEBUG       β”‚  ┝━ mark_complete_and_common_ref [ 34.2ms | 1.75% / 1.87% ] mappings: 1752
 09:57:08            tracing INFO        β”‚  β”‚  ┝━ mark_all_refs [ 2.06ms | 0.11% ]
 09:57:08            tracing DEBUG       β”‚  β”‚  ┝━ mark_alternate_refs [ 208ns | 0.00% ] num_odb: 0
 09:57:08            tracing INFO        β”‚  β”‚  ┝━ mark known_common [ 14.1Β΅s | 0.00% ]
 09:57:08            tracing DEBUG       β”‚  β”‚  ┕━ mark tips [ 32.8Β΅s | 0.00% ] num_tips: 913
 09:57:08            tracing DEBUG       β”‚  ┝━ negotiate round [ 2.22ms | 0.12% ] round: 1
 09:57:08            tracing DEBUG       β”‚  ┝━ negotiate round [ 1.03ms | 0.06% ] round: 2
[..]
 09:57:08            tracing DEBUG       β”‚  ┕━ negotiate round [ 273ms | 14.88% ] round: 68

With SSH, it's the same.

 10:01:18            tracing INFO     ┝━ remote::Connection::ref_map() [ 4.00s | 0.02% / 64.82% ]
 10:01:18            tracing INFO     β”‚  ┕━ remote::Connection::fetch_refs() [ 4.00s | 0.01% / 64.80% ]
 10:01:18            tracing DEBUG    β”‚     ┕━ gix_protocol::handshake() [ 4.00s | 64.79% ] service: UploadPack | extra_parameters: []
 10:01:18            tracing DEBUG    β”‚        ┕━ πŸ› [debug]: gix_transport::SpawnProcessOnDemand | command: LANG="C" LC_ALL="C" "ssh" "byron@localhost" "git-upload-pack" "\'~/dev/github.com/Byron/gitoxide\'"
 10:01:18            tracing INFO     ┕━ fetch::Prepare::receive() [ 2.16s | 0.02% / 34.97% ]
 10:01:18            tracing DEBUG       ┝━ negotiate [ 700ms | 0.07% / 11.33% ] protocol_version: 1
 10:01:18            tracing DEBUG       β”‚  ┝━ mark_complete_and_common_ref [ 57.5ms | 0.68% / 0.93% ] mappings: 1752
 10:01:18            tracing INFO        β”‚  β”‚  ┝━ mark_all_refs [ 15.2ms | 0.25% ]
 10:01:18            tracing DEBUG       β”‚  β”‚  ┝━ mark_alternate_refs [ 125ns | 0.00% ] num_odb: 0
 10:01:18            tracing INFO        β”‚  β”‚  ┝━ mark known_common [ 151Β΅s | 0.00% ]
 10:01:18            tracing DEBUG       β”‚  β”‚  ┕━ mark tips [ 160Β΅s | 0.00% ] num_tips: 913
 10:01:18            tracing DEBUG       β”‚  ┝━ negotiate round [ 14.6ms | 0.24% ] round: 1
 10:01:18            tracing DEBUG       β”‚  ┝━ negotiate round [ 3.53ms | 0.06% ] round: 2
[..]
 10:01:18            tracing DEBUG       β”‚  ┕━ negotiate round [ 391ms | 6.33% ] round: 68

So there really, really seems to be something special about that specific repository state we are seeing :/.

I'll try to figure out how to get the repo from the gitea server to debug locally.

It should be fine just to copy it, scp -r <serverpath> localpath should do the trick. From there you can add a remote like you@localhost:~/server-repo-copy to go through SSH or file:///path/to/server-repo-copy to call git-upload-pack directly. Both should probably be tried - if local SSH reproduces, then local git-upload-pack should as well.

If both don't reproduce, one should validate that the interaction pattern truly is the same (and I'd expect that). Otherwise… I don't know.

And thanks for your help! This whole process has been so pleasant <3

Thank you, that's great to hear, as I have a feeling the debugging won't be finished anytime soon πŸ˜….

jalil-salame commented 1 year ago

It should be fine just to copy it, scp -r localpath should do the trick. From there you can add a remote like you@localhost:~/server-repo-copy to go through SSH or file:///path/to/server-repo-copy to call git-upload-pack directly. Both should probably be tried - if local SSH reproduces, then local git-upload-pack should as well.

file:///path/to/server-repo-copy reproduces the issue (I can't test ssh without it being slightly painful 'cause NixOS doesn't have a simple "install root service" command). How do I get debug information from git-upload-pack?

jalil-salame commented 1 year ago
log of file:// url using v2 protocol ```log 11:47:52 indexing done 14.0 objects in 0.00s (9.9k objects/s) 11:47:52 decompressing done 2.9KB in 0.00s (2.1MB/s) 11:47:52 Resolving done 14.0 objects in 0.05s (273.0 objects/s) 11:47:52 Decoding done 4.2KB in 0.05s (81.5KB/s) 11:47:52 writing index file done 1.5KB in 0.00s (5.8MB/s) 11:47:52 create index file done 14.0 objects in 0.05s (264.0 objects/s) 11:47:52 read pack done 1.8KB in 0.06s (28.8KB/s) 11:47:52 tracing INFO run [ 75.5ms | 0.91% / 100.00% ] 11:47:52 tracing INFO ┝━ ThreadSafeRepository::discover() [ 3.65ms | 0.24% / 4.83% ] 11:47:52 tracing INFO β”‚ ┕━ open_from_paths() [ 3.47ms | 1.13% / 4.59% ] 11:47:52 tracing INFO β”‚ ┝━ gix_path::git::install_config_path() [ 2.51ms | 3.33% ] 11:47:52 tracing INFO β”‚ ┕━ gix_odb::Store::at() [ 98.2Β΅s | 0.13% ] 11:47:52 tracing INFO ┝━ remote::Connection::ref_map() [ 2.77ms | 0.02% / 3.67% ] 11:47:52 tracing INFO β”‚ ┕━ remote::Connection::fetch_refs() [ 2.76ms | 0.05% / 3.66% ] 11:47:52 tracing DEBUG β”‚ ┝━ gix_protocol::handshake() [ 2.24ms | 2.97% ] service: UploadPack | extra_parameters: [] 11:47:52 tracing DEBUG β”‚ β”‚ ┝━ πŸ› [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" "git-upload-pack" "/home/jalil/**CENSORED**.git" 11:47:52 tracing TRACE β”‚ β”‚ ┝━ πŸ“ [trace]: << version 2 11:47:52 tracing TRACE β”‚ β”‚ ┝━ πŸ“ [trace]: << agent=git/2.42.0 11:47:52 tracing TRACE β”‚ β”‚ ┝━ πŸ“ [trace]: << ls-refs=unborn 11:47:52 tracing TRACE β”‚ β”‚ ┝━ πŸ“ [trace]: << fetch=shallow wait-for-done 11:47:52 tracing TRACE β”‚ β”‚ ┝━ πŸ“ [trace]: << server-option 11:47:52 tracing TRACE β”‚ β”‚ ┝━ πŸ“ [trace]: << object-format=sha1 11:47:52 tracing TRACE β”‚ β”‚ ┝━ πŸ“ [trace]: << object-info 11:47:52 tracing TRACE β”‚ β”‚ ┕━ πŸ“ [trace]: << FLUSH 11:47:52 tracing DEBUG β”‚ ┕━ gix_protocol::ls_refs() [ 485Β΅s | 0.64% ] capabilities: Capabilities { data: "agent=git/2.42.0\nls-refs=unborn\nfetch=shallow wait-for-done\nserver-option\nobject-format=sha1\nobject-info", value_sep: 10 } 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> command=ls-refs 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> agent=git/oxide-0.55.2 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> DELIM 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> symrefs 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> peel 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> unborn 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> ref-prefix refs/heads/ 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> ref-prefix refs/tags/ 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> FLUSH 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ff30028d5fb155dcd6fcc250cb256fccba528245 refs/heads/main 11:47:52 tracing TRACE β”‚ ┕━ πŸ“ [trace]: << FLUSH 11:47:52 tracing INFO ┕━ fetch::Prepare::receive() [ 68.4ms | 0.49% / 90.58% ] 11:47:52 tracing DEBUG ┝━ negotiate [ 3.31ms | 0.24% / 4.38% ] protocol_version: 2 11:47:52 tracing DEBUG β”‚ ┝━ mark_complete_and_common_ref [ 595Β΅s | 0.30% / 0.79% ] mappings: 1 11:47:52 tracing INFO β”‚ β”‚ ┝━ mark_all_refs [ 367Β΅s | 0.49% ] 11:47:52 tracing DEBUG β”‚ β”‚ ┝━ mark_alternate_refs [ 445ns | 0.00% ] num_odb: 0 11:47:52 tracing INFO β”‚ β”‚ ┝━ mark known_common [ 861ns | 0.00% ] 11:47:52 tracing DEBUG β”‚ β”‚ ┕━ mark tips [ 1.50Β΅s | 0.00% ] num_tips: 1 11:47:52 tracing DEBUG β”‚ ┕━ negotiate round [ 2.53ms | 3.35% ] round: 1 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> command=fetch 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> agent=git/oxide-0.55.2 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> DELIM 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> thin-pack 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> ofs-delta 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> include-tag 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> want ff30028d5fb155dcd6fcc250cb256fccba528245 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 713418f2c3ef057538264c6c379d45836a86ca06 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 24ed375e274f6c90b846b5579ae01ab8ac77e631 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 4896909faec56497fdd03f7c00ec51570a6c5d88 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have ef17d82d35d901a702d561561cedd6d9ee73af14 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 4da23d2886e815c9b3c524826bf3fdd83226a85e 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have d29e402d3e90a3bcc352107a2ec7cc499d799479 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 82a264f12274ca8e1999b73b5fd850c85118f50a 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 795e583ee2931e25b7347616e013099fcaa9789c 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 4661adc46821622bcecc94188e2e1e4382f0c2cc 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have d131fb4dd3611b1ceed828b3c3b61266e928d031 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have ab24553467cc323133e9e81eeeabf21f08e2d97e 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> FLUSH 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << acknowledgments 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 713418f2c3ef057538264c6c379d45836a86ca06 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 24ed375e274f6c90b846b5579ae01ab8ac77e631 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 4896909faec56497fdd03f7c00ec51570a6c5d88 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ef17d82d35d901a702d561561cedd6d9ee73af14 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 4da23d2886e815c9b3c524826bf3fdd83226a85e 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK d29e402d3e90a3bcc352107a2ec7cc499d799479 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 82a264f12274ca8e1999b73b5fd850c85118f50a 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 795e583ee2931e25b7347616e013099fcaa9789c 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 4661adc46821622bcecc94188e2e1e4382f0c2cc 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK d131fb4dd3611b1ceed828b3c3b61266e928d031 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ab24553467cc323133e9e81eeeabf21f08e2d97e 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ready 11:47:52 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << DELIM 11:47:52 tracing TRACE β”‚ ┕━ πŸ“ [trace]: << packfile **rest is broken by binary data** ```
log of file:// url using v1 protocol ```log 12:05:50 tracing INFO run [ 137s | 0.00% / 100.00% ] 12:05:50 tracing INFO ┝━ ThreadSafeRepository::discover() [ 4.90ms | 0.00% / 0.00% ] 12:05:50 tracing INFO β”‚ ┕━ open_from_paths() [ 4.45ms | 0.00% / 0.00% ] 12:05:50 tracing INFO β”‚ ┝━ gix_path::git::install_config_path() [ 3.18ms | 0.00% ] 12:05:50 tracing INFO β”‚ ┕━ gix_odb::Store::at() [ 136Β΅s | 0.00% ] 12:05:50 tracing INFO ┝━ remote::Connection::ref_map() [ 3.67ms | 0.00% / 0.00% ] 12:05:50 tracing INFO β”‚ ┕━ remote::Connection::fetch_refs() [ 3.66ms | 0.00% / 0.00% ] 12:05:50 tracing DEBUG β”‚ ┕━ gix_protocol::handshake() [ 3.63ms | 0.00% ] service: UploadPack | extra_parameters: [] 12:05:50 tracing DEBUG β”‚ ┝━ πŸ› [debug]: gix_transport::SpawnProcessOnDemand | command: "git-upload-pack" "/home/jalil/**CENSORED**.git" 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ff30028d5fb155dcd6fcc250cb256fccba528245 HEADmulti_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/main object-format=sha1 agent=git/2.42.0 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ff30028d5fb155dcd6fcc250cb256fccba528245 refs/heads/main 12:05:50 tracing TRACE β”‚ ┕━ πŸ“ [trace]: << FLUSH 12:05:50 tracing INFO ┕━ fetch::Prepare::receive() [ 137s | 0.00% / 99.99% ] 12:05:50 tracing DEBUG ┕━ negotiate [ 137s | 0.00% / 99.99% ] protocol_version: 1 12:05:50 tracing DEBUG ┝━ mark_complete_and_common_ref [ 588Β΅s | 0.00% / 0.00% ] mappings: 1 12:05:50 tracing INFO β”‚ ┝━ mark_all_refs [ 370Β΅s | 0.00% ] 12:05:50 tracing DEBUG β”‚ ┝━ mark_alternate_refs [ 404ns | 0.00% ] num_odb: 0 12:05:50 tracing INFO β”‚ ┝━ mark known_common [ 1.15Β΅s | 0.00% ] 12:05:50 tracing DEBUG β”‚ ┕━ mark tips [ 1.46Β΅s | 0.00% ] num_tips: 1 12:05:50 tracing DEBUG ┝━ negotiate round [ 137s | 99.99% ] round: 1 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> want ff30028d5fb155dcd6fcc250cb256fccba528245 thin-pack side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative multi_ack_detailed agent=git/oxide-0.55.2 include-tag 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> FLUSH 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 713418f2c3ef057538264c6c379d45836a86ca06 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 24ed375e274f6c90b846b5579ae01ab8ac77e631 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 4896909faec56497fdd03f7c00ec51570a6c5d88 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have ef17d82d35d901a702d561561cedd6d9ee73af14 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 4da23d2886e815c9b3c524826bf3fdd83226a85e 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have d29e402d3e90a3bcc352107a2ec7cc499d799479 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 82a264f12274ca8e1999b73b5fd850c85118f50a 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 795e583ee2931e25b7347616e013099fcaa9789c 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 4661adc46821622bcecc94188e2e1e4382f0c2cc 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have d131fb4dd3611b1ceed828b3c3b61266e928d031 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have ab24553467cc323133e9e81eeeabf21f08e2d97e 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> FLUSH 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 713418f2c3ef057538264c6c379d45836a86ca06 common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 24ed375e274f6c90b846b5579ae01ab8ac77e631 common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 4896909faec56497fdd03f7c00ec51570a6c5d88 common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ef17d82d35d901a702d561561cedd6d9ee73af14 common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 4da23d2886e815c9b3c524826bf3fdd83226a85e common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK d29e402d3e90a3bcc352107a2ec7cc499d799479 common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 82a264f12274ca8e1999b73b5fd850c85118f50a common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 795e583ee2931e25b7347616e013099fcaa9789c common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 4661adc46821622bcecc94188e2e1e4382f0c2cc common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK d131fb4dd3611b1ceed828b3c3b61266e928d031 common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ab24553467cc323133e9e81eeeabf21f08e2d97e common 12:05:50 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ab24553467cc323133e9e81eeeabf21f08e2d97e ready 12:05:50 tracing TRACE β”‚ ┕━ πŸ“ [trace]: << NAK 12:05:50 tracing DEBUG ┕━ negotiate round [ 127Β΅s | 0.00% ] round: 2 12:05:50 tracing TRACE ┕━ πŸ“ [trace]: >> done  Error: An IO error occurred when talking to the server Caused by: Broken pipe (os error 32) ```
Byron commented 1 year ago

πŸŽ‰!

V1 of the protocol is very close as it reproduces the issue perfectly while using a local git-upload-pack invocation.

I assume that you have exported GIT_TRACE_PACKET=1 but it didn't actually provide the output of the local git invocation. The following patch will correct that:

diff --git a/gix-transport/src/client/blocking_io/file.rs b/gix-transport/src/client/blocking_io/file.rs
index e99f79bc9..446992ff8 100644
--- a/gix-transport/src/client/blocking_io/file.rs
+++ b/gix-transport/src/client/blocking_io/file.rs
@@ -209,7 +209,7 @@ impl client::Transport for SpawnProcessOnDemand {
                 Cow::Owned(command.to_owned()),
             ),
             None => (
-                gix_command::prepare(service.as_str()).stderr(Stdio::null()),
+                gix_command::prepare(service.as_str()).stderr(Stdio::inherit()),
                 None,
                 Cow::Borrowed(OsStr::new(service.as_str())),
             ),

(can be applied with patch -p1 the-patch-above.patch)

The output should contain everything that git sees, followed by the known output above after pressing Ctrl+C.

And that output will finally prove, once and for all, that git does or doesn't see the final done, and it will also become clear if for some reason it sends more than gix consumes (unlikely, but it's a possibility). Then it should become evident what has to be done here.

Thanks again for your help!

jalil-salame commented 1 year ago

Here is the log with git-upload-pack's output:

log ```log 13:15:09.294150 pkt-line.c:85 packet: upload-pack> ff30028d5fb155dcd6fcc250cb256fccba528245 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/main object-format=sha1 agent=git/2.42.0 13:15:09.294589 pkt-line.c:85 packet: upload-pack> ff30028d5fb155dcd6fcc250cb256fccba528245 refs/heads/main 13:15:09.294626 pkt-line.c:85 packet: upload-pack> 0000 13:15:09.296360 pkt-line.c:85 packet: upload-pack< want ff30028d5fb155dcd6fcc250cb256fccba528245 thin-pack side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative multi_ack_detailed agent=git/oxide-0.55.2 include-tag 13:15:09.296523 pkt-line.c:85 packet: upload-pack< 0000 13:15:09.296538 pkt-line.c:85 packet: upload-pack< have 713418f2c3ef057538264c6c379d45836a86ca06 13:15:09.296678 pkt-line.c:85 packet: upload-pack> ACK 713418f2c3ef057538264c6c379d45836a86ca06 common 13:15:09.296698 pkt-line.c:85 packet: upload-pack< have 24ed375e274f6c90b846b5579ae01ab8ac77e631 13:15:09.296781 pkt-line.c:85 packet: upload-pack> ACK 24ed375e274f6c90b846b5579ae01ab8ac77e631 common 13:15:09.296799 pkt-line.c:85 packet: upload-pack< have 4896909faec56497fdd03f7c00ec51570a6c5d88 13:15:09.296883 pkt-line.c:85 packet: upload-pack> ACK 4896909faec56497fdd03f7c00ec51570a6c5d88 common 13:15:09.296902 pkt-line.c:85 packet: upload-pack< have ef17d82d35d901a702d561561cedd6d9ee73af14 13:15:09.296985 pkt-line.c:85 packet: upload-pack> ACK ef17d82d35d901a702d561561cedd6d9ee73af14 common 13:15:09.297002 pkt-line.c:85 packet: upload-pack< have 4da23d2886e815c9b3c524826bf3fdd83226a85e 13:15:09.297075 pkt-line.c:85 packet: upload-pack> ACK 4da23d2886e815c9b3c524826bf3fdd83226a85e common 13:15:09.297092 pkt-line.c:85 packet: upload-pack< have d29e402d3e90a3bcc352107a2ec7cc499d799479 13:15:09.297165 pkt-line.c:85 packet: upload-pack> ACK d29e402d3e90a3bcc352107a2ec7cc499d799479 common 13:15:09.297182 pkt-line.c:85 packet: upload-pack< have ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 13:15:09.297262 pkt-line.c:85 packet: upload-pack> ACK ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 common 13:15:09.297279 pkt-line.c:85 packet: upload-pack< have 82a264f12274ca8e1999b73b5fd850c85118f50a 13:15:09.297352 pkt-line.c:85 packet: upload-pack> ACK 82a264f12274ca8e1999b73b5fd850c85118f50a common 13:15:09.297370 pkt-line.c:85 packet: upload-pack< have 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 13:15:09.297445 pkt-line.c:85 packet: upload-pack> ACK 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 common 13:15:09.297462 pkt-line.c:85 packet: upload-pack< have 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 13:15:09.297537 pkt-line.c:85 packet: upload-pack> ACK 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 common 13:15:09.297554 pkt-line.c:85 packet: upload-pack< have b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 13:15:09.297628 pkt-line.c:85 packet: upload-pack> ACK b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 common 13:15:09.297645 pkt-line.c:85 packet: upload-pack< have 795e583ee2931e25b7347616e013099fcaa9789c 13:15:09.297718 pkt-line.c:85 packet: upload-pack> ACK 795e583ee2931e25b7347616e013099fcaa9789c common 13:15:09.297734 pkt-line.c:85 packet: upload-pack< have 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca 13:15:09.297807 pkt-line.c:85 packet: upload-pack> ACK 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca common 13:15:09.297824 pkt-line.c:85 packet: upload-pack< have 4661adc46821622bcecc94188e2e1e4382f0c2cc 13:15:09.297962 pkt-line.c:85 packet: upload-pack> ACK 4661adc46821622bcecc94188e2e1e4382f0c2cc common 13:15:09.297981 pkt-line.c:85 packet: upload-pack< have d131fb4dd3611b1ceed828b3c3b61266e928d031 13:15:09.298459 pkt-line.c:85 packet: upload-pack> ACK d131fb4dd3611b1ceed828b3c3b61266e928d031 common 13:15:09.298491 pkt-line.c:85 packet: upload-pack< have ab24553467cc323133e9e81eeeabf21f08e2d97e 13:15:09.298530 pkt-line.c:85 packet: upload-pack> ACK ab24553467cc323133e9e81eeeabf21f08e2d97e common 13:15:09.298543 pkt-line.c:85 packet: upload-pack< 0000 13:15:09.298604 pkt-line.c:85 packet: upload-pack> ACK ab24553467cc323133e9e81eeeabf21f08e2d97e ready 13:15:09.298616 pkt-line.c:85 packet: upload-pack> NAK 13:15:58 tracing INFO run [ 49.1s | 0.02% / 100.00% ] 13:15:58 tracing INFO ┝━ ThreadSafeRepository::discover() [ 7.39ms | 0.00% / 0.02% ] 13:15:58 tracing INFO β”‚ ┕━ open_from_paths() [ 7.02ms | 0.00% / 0.01% ] 13:15:58 tracing INFO β”‚ ┝━ gix_path::git::install_config_path() [ 4.97ms | 0.01% ] 13:15:58 tracing INFO β”‚ ┕━ gix_odb::Store::at() [ 195Β΅s | 0.00% ] 13:15:58 tracing INFO ┝━ remote::Connection::ref_map() [ 3.83ms | 0.00% / 0.01% ] 13:15:58 tracing INFO β”‚ ┕━ remote::Connection::fetch_refs() [ 3.82ms | 0.00% / 0.01% ] 13:15:58 tracing DEBUG β”‚ ┕━ gix_protocol::handshake() [ 3.79ms | 0.01% ] service: UploadPack | extra_parameters: [] 13:15:58 tracing DEBUG β”‚ ┝━ πŸ› [debug]: gix_transport::SpawnProcessOnDemand | command: "git-upload-pack" "/home/jalil/**CENSORED**.git" 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ff30028d5fb155dcd6fcc250cb256fccba528245 HEADmulti_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/main object-format=sha1 agent=git/2.42.0 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ff30028d5fb155dcd6fcc250cb256fccba528245 refs/heads/main 13:15:58 tracing TRACE β”‚ ┕━ πŸ“ [trace]: << FLUSH 13:15:58 tracing INFO ┕━ fetch::Prepare::receive() [ 49.1s | 0.00% / 99.96% ] 13:15:58 tracing DEBUG ┕━ negotiate [ 49.1s | 0.00% / 99.96% ] protocol_version: 1 13:15:58 tracing DEBUG ┝━ mark_complete_and_common_ref [ 826Β΅s | 0.00% / 0.00% ] mappings: 1 13:15:58 tracing INFO β”‚ ┝━ mark_all_refs [ 401Β΅s | 0.00% ] 13:15:58 tracing DEBUG β”‚ ┝━ mark_alternate_refs [ 483ns | 0.00% ] num_odb: 0 13:15:58 tracing INFO β”‚ ┝━ mark known_common [ 746ns | 0.00% ] 13:15:58 tracing DEBUG β”‚ ┕━ mark tips [ 9.55Β΅s | 0.00% ] num_tips: 1 13:15:58 tracing DEBUG ┝━ negotiate round [ 49.1s | 99.95% ] round: 1 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> want ff30028d5fb155dcd6fcc250cb256fccba528245 thin-pack side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative multi_ack_detailed agent=git/oxide-0.55.2 include-tag 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> FLUSH 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 713418f2c3ef057538264c6c379d45836a86ca06 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 24ed375e274f6c90b846b5579ae01ab8ac77e631 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 4896909faec56497fdd03f7c00ec51570a6c5d88 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have ef17d82d35d901a702d561561cedd6d9ee73af14 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 4da23d2886e815c9b3c524826bf3fdd83226a85e 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have d29e402d3e90a3bcc352107a2ec7cc499d799479 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 82a264f12274ca8e1999b73b5fd850c85118f50a 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 795e583ee2931e25b7347616e013099fcaa9789c 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have 4661adc46821622bcecc94188e2e1e4382f0c2cc 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have d131fb4dd3611b1ceed828b3c3b61266e928d031 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> have ab24553467cc323133e9e81eeeabf21f08e2d97e 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: >> FLUSH 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 713418f2c3ef057538264c6c379d45836a86ca06 common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 24ed375e274f6c90b846b5579ae01ab8ac77e631 common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 4896909faec56497fdd03f7c00ec51570a6c5d88 common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ef17d82d35d901a702d561561cedd6d9ee73af14 common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 4da23d2886e815c9b3c524826bf3fdd83226a85e common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK d29e402d3e90a3bcc352107a2ec7cc499d799479 common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 82a264f12274ca8e1999b73b5fd850c85118f50a common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 795e583ee2931e25b7347616e013099fcaa9789c common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK 4661adc46821622bcecc94188e2e1e4382f0c2cc common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK d131fb4dd3611b1ceed828b3c3b61266e928d031 common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ab24553467cc323133e9e81eeeabf21f08e2d97e common 13:15:58 tracing TRACE β”‚ ┝━ πŸ“ [trace]: << ACK ab24553467cc323133e9e81eeeabf21f08e2d97e ready 13:15:58 tracing TRACE β”‚ ┕━ πŸ“ [trace]: << NAK 13:15:58 tracing DEBUG ┕━ negotiate round [ 132Β΅s | 0.00% ] round: 2 13:15:58 tracing TRACE ┕━ πŸ“ [trace]: >> done Error: An IO error occurred when talking to the server Caused by: Broken pipe (os error 32) ```
Byron commented 1 year ago

That's great :)! And it clearly shows that git doesn't actually receive the done which is the reason for the hang. We also see that gix sends it, which is followed by flush right away. We also know that stdin doesn't seem to be flushed when flush() is called, and the underlying AnonPipe also doesn't seem to have a flush implementation.

In theory, it should be possible to try forcing a flush (just to prove that it works then) by flushing on the underlying file descriptor directly. But doing that would be cumbersome in Rust especially since it's just a test.

If you are on linux, maybe you could try to flush the descriptor while the process is hanging. One could just grab all open descriptors and flush them if it's unclear which one it is.

If that works, it would be clear that it's a bug in the standard library which ought to actually flush.

jalil-salame commented 1 year ago

I tried attaching to gix with gdb, then calling fsync on all file descriptors, it didn't seem to fix it.

Are you sure it's stdin and not one of the open pipes (I tried all of the open fds though)? Trying the same on git-upload-pack didn't do anything either...

jalil-salame commented 1 year ago

Using ChatGPT's python script didn't fix the issue either

Byron commented 1 year ago

Thanks a lot for trying!

This means I am puzzled as to where the done could have gone. Rust sends it into a pipe that should connect to the git-upload-pack which has just sent ACK and NAK and would now proceed to read the next packetline. That should be done and then the pack is sent.

But that clearly doesn't happen.

Maybe something else is happening here, somehow. What confuses me is that done is sent in the second round which would mean that it finished parsing the first response. But according to the logic here with client_expects_pack=false and saw_ready=true, we'd get false for the filter which would then try to read past the last NAK which should make it stall right there. Thus it wouldn't get to send done at all in the second round.

In any case, the way I understand the code in git-upload-pack, a logic change seems in order:

diff --git a/gix-protocol/src/fetch/response/blocking_io.rs b/gix-protocol/src/fetch/response/blocking_io.rs
index 309f5a7c5..d36a1a45f 100644
--- a/gix-protocol/src/fetch/response/blocking_io.rs
+++ b/gix-protocol/src/fetch/response/blocking_io.rs
@@ -85,7 +85,7 @@ impl Response {
                     assert_ne!(reader.readline_str(&mut line)?, 0, "consuming a peeked line works");
                     // When the server sends ready, we know there is going to be a pack so no need to stop early.
                     saw_ready |= matches!(acks.last(), Some(Acknowledgement::Ready));
-                    if let Some(Acknowledgement::Nak) = acks.last().filter(|_| !client_expects_pack && !saw_ready) {
+                    if let Some(Acknowledgement::Nak) = acks.last().filter(|_| !client_expects_pack || !saw_ready) {
                         break 'lines false;
                     }
                 };

Can you try it with this patch? it passes the test-suite, so that's a start (and it will hang if I butcher it too much).

jalil-salame commented 1 year ago
diff --git a/gix-protocol/src/fetch/response/blocking_io.rs b/gix-protocol/src/fetch/response/blocking_io.rs
index 309f5a7c5..d36a1a45f 100644
--- a/gix-protocol/src/fetch/response/blocking_io.rs
+++ b/gix-protocol/src/fetch/response/blocking_io.rs
@@ -85,7 +85,7 @@ impl Response {
                     assert_ne!(reader.readline_str(&mut line)?, 0, "consuming a peeked line works");
                     // When the server sends ready, we know there is going to be a pack so no need to stop early.
                     saw_ready |= matches!(acks.last(), Some(Acknowledgement::Ready));
-                    if let Some(Acknowledgement::Nak) = acks.last().filter(|_| !client_expects_pack && !saw_ready) {
+                    if let Some(Acknowledgement::Nak) = acks.last().filter(|_| !client_expects_pack || !saw_ready) {
                         break 'lines false;
                     }
                 };

That did it! Works both locally and through ssh!

Byron commented 1 year ago

That's incredible! A small change with huge effect! I can now just hope that the test-coverage is as good as I think or else something else might break πŸ˜… (at least the blast radios is limited to V1).

Alright, the PR is in flight and I hope it will be smooth sailing from now on :).

Byron commented 1 year ago

Can you try once more from this PR? It contains adjustments to the logic to work with more test-cases, and I can only hope that it also still covers your case.

jalil-salame commented 1 year ago

Can you try once more from this PR? It contains adjustments to the logic to work with more test-cases, and I can only hope that it also still covers your case.

Verified that main fixes the issue c: Thanks for all of this!

mainrs commented 11 months ago
$ gix --trace fetch

I scimmed through the code but couldn't ultimately figure out what kind of logging was used for the hierarchies printed by this command.

It doesn't seem to be tracing or log as far as I can tell.