TCP Congestion Control disproportionately affects git-lfs

withinboredom commented 1 year ago

Describe the bug When uploading on an unreliable/congested link (e.g., crowded WiFi) upload speeds are horrendous (30 kbytes per second) but when on a reliable link (e.g., wired, even on the same network) upload speeds are comparable to link speed.

More specifically, TCP retransmissions or out-of-order ACKs can prevent git-lfs from functioning at any theoretical speed. However, other software, such as curl, web browsers, etc, function just fine and can reach near link speed with even worse packet retransmissions or out-of-order ACKS.

To Reproduce Steps to reproduce the behavior:

Put wifi network on a crowded channel. You should see TCP retransmissions in Wireshark for web browsing traffic.
Add a large file to git-lfs.
Push file to git-lfs.
See extremely low speeds.

Expected behavior Git-LFS should be minimally affected by TCP congestion control.

System environment Linux/WSL/Bare Metal

Output of git lfs env N/A

Additional context

I haven't gone into how the sockets are created, but if I had to guess, the code is not using the Nagle Algorithm (TCP_NODELAY utilized).

chrisd8088 commented 1 year ago

Hey, I'm sorry you're having trouble, and thanks for taking some time to look into this.

As a general rule, we try to ensure Git LFS follows whatever Git does, although of course they are written in different languages. Git uses libcurl, which I believe sets TCP_NODELAY by default and therefore disables Nagle's algorithm, as can be seen in a Git trace log:

$ GIT_CURL_VERBOSE=1 git fetch
19:49:07.005600 http.c:701              == Info: Couldn't find host github.com in the .netrc file; using defaults
19:49:07.008335 http.c:701              == Info:   Trying 140.82.113.4...
19:49:07.008346 http.c:701              == Info: TCP_NODELAY set
19:49:07.119959 http.c:701              == Info: Connected to github.com (140.82.113.4) port 443 (#0)
...

That's for HTTP traffic; for SSH, the situation appears to be a bit more murky, but it seems like OpenSSH, at least, may leave TCP_NODELAY disabled. Since Git LFS largely uses HTTP at present (there is a pure-SSH transport protocol available, but I'm not aware of any Git LFS hosting services which support it), that's where we'd want to focus attention for now.

Although we do try to follow Git's example as an overall guide, there are circumstances where we want to do something different. Git LFS is intended for use with large files in particular, of course, so changes which help that use case are valuable even if they deviate from Git's own implementation.

I think the first thing we should probably look at here is whether Git LFS (and the underlying Go libraries) are optimizing TCP socket writes or not. We should be avoiding making too many small writes where we can instead make a single larger one, and avoiding the "write-write-read" pattern if it appears anywhere in our code, so we don't have reads waiting on the final write in a sequence of writes. Regardless of the setting of TCP_NODELAY, any such changes should be a net benefit.

As an experiment, we'd be interested in what you see in your environment if you compile Git LFS with a patch to the source to call SetNoDelay(false) on the HTTP connections, because if that shows an improvement, it would suggest there are places in our code where writes could be optimized.

Like Git, Git LFS has users in a wide variety of environments, so finding a common configuration which suits all of them may not be possible, but we do want to provide the best performance we can, especially with large files, and for the majority of use cases.

And as always, patches and test data are extremely welcome!

funny-falcon commented 1 year ago

Almost every Golang's networking or file io should go through bufio or some equivalent of. If it doesn't it likely to be a mistake.

darekrusin commented 1 year ago

FYI, here's link to a discussion that happened on Hacker News that might be helpful: https://news.ycombinator.com/item?id=34179426

It has comments from both:

Russ Cox - https://news.ycombinator.com/item?id=34181846
Nagle - https://news.ycombinator.com/item?id=34180239

jeffallen commented 1 year ago

I have looked into this some, and I found that the reads to the large file during the uploading process are 32k. So this is probably not a buffering problem.

It does seem like a reasonable strategy would be to call SetNoDelay(false) on the underlying socket of the HTTP request when we know we are likely to be doing bulk data transfer. For example, not verifylocks, batch or verify, but the basicUploader transfers in the middle of the upload process (and whatever the equivalent code path is for the downloads).

It is possible to set http.Request.Close to true, in order to force the HTTP implementation to open a new connection for you, so you know you are not setting the option on sockets that will stay in the connection cache. It is also possible to give a DialTLSContext function in the Transport, so that you have a chance to SetNoDelay on the socket.

Here is an example of what's necessary to call SetNoDelay on an HTTPS URL:

package main

import (
    "context"
    "crypto/tls"
    "io/ioutil"
    "log"
    "net"
    "net/http"
)

func main() {
    req, _ := http.NewRequest("GET", "https://nella.org/", nil)
    tp := http.Transport{
        DialTLSContext: func(ctx context.Context, network, addr string) (net.Conn, error) {
            a, err := net.ResolveTCPAddr(network, addr)
            if err != nil {
                return nil, err
            }
            t, err := net.DialTCP(network, nil, a)
            if err != nil {
                return nil, err
            }
            t.SetNoDelay(false)
            h, _, err := net.SplitHostPort(addr)
            if err != nil {
                return nil, err
            }
            return tls.Client(t, &tls.Config{ServerName: h}), nil
        },
    }
    resp, err := tp.RoundTrip(req)
    if err != nil {
        log.Fatal(err)
    }
    b, _ := ioutil.ReadAll(resp.Body)
    println(string(b))
}

I will unfortunately not be able to work more on this, hope this background info is at least useful.

duaneking commented 1 year ago

Respectfully, What is the logic on the thinking behind not allowing users to override the defaults via a command line flag?

chrisd8088 commented 1 year ago

Respectfully, What is the logic on the thinking behind not allowing users to override the defaults via a command line flag?

I'm not aware of any such determination, one way or another. Is there something in particular you had in mind, @duaneking? So far as I know, the question of setting or not setting the TCP_NODELAY flag has just never come up before as a point of discussion.

That said, my own concern is that while adding a configuration option helps users who are savvy enough to use it appropriately, we want to ensure that we are performing socket operations as efficiently as possible regardless of low-level settings. That doesn't preclude a configuration option, but I think it's more important for the average user.

The original issue description above doesn't identify their Git LFS environment or version, and doesn't link to the associated blog post, which was apparently posted on the Hacker News site. In that post, the author states:

When I looked at the packets, they were being sent in ~50-byte payload chunks (~100 bytes total, MTU is 1500).

A number of the commentators on the thread then discuss that detail, but so far as I've seen, we don't have a full reproduction case yet, including Git LFS version (which implies a given Go version), operating system version, and Git configuration, plus Wireshark or other utility configuration and setup, and tracing log output (minus any private information). It would also be useful to know the contents of the test data and how it was created.

It would be good to know if this is something others have validated, and on which operating systems and in which environments. In the meantime, once the core team has returned from holidays and dealt with any pressing $DAYJOB issues, we can a take a look as well.

chrisd8088 commented 1 year ago

@jeffallen -- Thanks for the comments!. You note:

I have looked into this some, and I found that the reads to the large file during the uploading process are 32k. So this is probably not a buffering problem.

Is that something you spotted in the Go libraries, or in Git LFS itself?

(If by any chance you were looking at the buffer size in the SSH protocol code, note that that's not used for HTTP traffic, which is what all uploads use at the moment with the main Git LFS hosting services, as I believe none of those services have implemented the pure SSH transport protocol yet.)

jeffallen commented 1 year ago

I found when doing a "git lfs push" over HTTP and putting a breakpoint on the implementation of read() for the file descriptor, that it is reading 32 kb of data at a time from the operating system (this was in Windows).

I did not check the other direction, i.e. towards the network, to see if the data got broken up there. I was not able to see the same small packet behavior from Wireshark, but I didn't have time to try very hard.

I think the next step is to have a really stable repro case, and then try to figure out if and why the send()s to the network are less than full size buffers.

mastercoms commented 1 year ago

I've identified possible relevant code here:

https://github.com/git-lfs/git-lfs/blob/main/tq/basic_upload.go#L53-L96 https://github.com/git-lfs/git-lfs/blob/main/tools/copycallback.go#L77-L88

I think buffer should be around 2MB. We can also ReadAll for any file with a size below that.

The 32KB called out in this thread is probably a filesystem buffering value which is dependent on the setup, likewise for the amount of bytes being transferred as described in the original blog post.

duaneking commented 1 year ago

It's important to consider that file systems will have an independent journaling block size, so that might be an issue as well, as it will change the data size of the blocks being buffered.

Future people responding to this bug with failure cases should also include the hardware platform, os platform, the the hard drive/file system data, just in case, imho.

git-lfs / git-lfs

TCP Congestion Control disproportionately affects git-lfs #5242