git-lfs / git-lfs

Git extension for versioning large files
https://git-lfs.com
Other
12.81k stars 2.02k forks source link

File locks are extremely slow and basically unusable #2978

Open Zeblote opened 6 years ago

Zeblote commented 6 years ago

We're trying to use git lfs file locks with our Unreal Engine project, but the performance is very disappointing.

All of the lock commands are taking several seconds to execute: git lfs locks git lfs lock ... git lfs unlock ...

There also doesn't seem to be any way to lock/unlock multiple files at once, so we are often stuck waiting for it to unlock a list of files... one by one... taking 3 seconds each.

Using a gitlab.com repo, windows 10, latest git and lfs, connecting using ssh. Is there anything we can do to speed this up?

ttaylorr commented 6 years ago

Hi @Zeblote, thanks for opening this. It's a good question, and here's the honest answer: there isn't anything you can do today to make it faster (for now).

Let me expand:

All of the lock commands are taking several seconds to execute:

Unlike Git, which supports a protocol over HTTP(S), git://, and ssh, Git LFS provides an API specification for HTTP(S) only. While all of these protocols have inherent connection overhead, HTTP(S) has noticeably more, which can make it commands that have to call out to a remote take much longer than commands that don't, even if they aren't spending a lot of time sending data back and forth.

So, the solution is to support multiple protocols, and ideally have at least one that has a lower connection overhead than HTTP(S) and can stay alive for longer. I think that the sensible choice for this is pkt-line over SSH, very much inspired by what Git chooses to do.

But, the road from here to there is a long one. I thought for a while that this is something the Git LFS project wouldn't have time to pursue, but I think that the demand is loud enough that we start at least exploring what it would look like to implement this. As a rough cut, I think that the following is what would be needed:

  1. Document a proposed successor for the HTTPS API (locking and batch) as a pkt-line protocol.
  2. Form consensus upon that protocol suggestion.
  3. Take apart the code in git/pkt_line{,_}*.go into a new repository, git-lfs/pktline.
  4. Take apart the code in package lfsapi, so that we can easily add a new protocol without updating calling code (and so that things like extensions, custom transfer agents, and etc. can still work properly).
  5. Provide a reference implementation in our (1) test suite and (2) in git-lfs/lfs-test-server.

That's a lot, but I think is worth at a minimum discussing in more detail. I think that those are the necessary steps that the project would have to take in order to make these commands faster. As a precursor to all of this, I think that we should make a scratch implementation so that we can quickly evaluate whether or not this would provide a meaningful speed-up.

There also doesn't seem to be any way to lock/unlock multiple files at once, so we are often stuck waiting for it to unlock a list of files... one by one... taking 3 seconds each.

This issue in particular is under discussion in https://github.com/git-lfs/git-lfs/issues/2671. I left a note there in https://github.com/git-lfs/git-lfs/issues/2671#issuecomment-359126336 a handful of months ago, outlining some concerns I have for training git lfs unlock to take multiple lock identifiers.

(The main gist is that it is certainly possible, but would require a protocol extension to work in a non-surprising way).

larsxschneider commented 6 years ago

@Zeblote Does your Unreal Engine project contain a lot of files and directories? If yes, then you might want to try setting the git config lfs.setlockablereadonly=false. That might not be ideal if you are using locks but it might speed up the lock operations until we find a better solution for the problem.

See more info here: https://github.com/git-lfs/git-lfs/pull/1822/files#r187119130

Zeblote commented 6 years ago

There weren't that many files, no.

Though we have already replaced the git repo with a perforce one, since it doesn't look like there will be any short term solution. Everything is lightning fast now!

Kleptine commented 6 years ago

I don't feel like an improvement in the protocol really addresses the issue -- even bringing the speed per lock down by a factor of ten (.3 seconds per lock) is still pretty un-useable if I'm locking a couple hundred files (say I want to make a small change to all of the icons in the game).

The more natural solution, in my opinion, is the API changes to allow multiple identifiers, as talked about in #2671.

@ttaylorr Do you have an estimated road-map for any of these features -- is it mostly just blocked on time to implement? I, like others, desperately want to use LFS, but can't without proper locking. :\

ttaylorr commented 6 years ago

I don't feel like an improvement in the protocol really addresses the issue -- even bringing the speed per lock down by a factor of ten (.3 seconds per lock) is still pretty un-useable if I'm locking a couple hundred files (say I want to make a small change to all of the icons in the game).

The more natural solution, in my opinion, is the API changes to allow multiple identifiers, as talked about in #2671.

I think that the later should be approached first, since it has the highest perceived value added to cost ratio. That said, I think that both should be achieved at some point in the future.

@ttaylorr Do you have an estimated road-map for any of these features -- is it mostly just blocked on time to implement?

I do not have an estimated roadmap currently, no.

jirutka commented 6 years ago

All of the lock commands are taking several seconds to execute:

Unlike Git, which supports a protocol over HTTP(S), git://, and ssh, Git LFS provides an API specification for HTTP(S) only. While all of these protocols have inherent connection overhead, HTTP(S) has noticeably more

Sorry, but this is total nonsense. Overhead of HTTPS is definitely not several seconds, unless you have pretty slow connection. Moreover, you don't have to establish a new HTTPS connection for each lock request; even if the server API does not support locking multiple files at once, client can open one connection and send whatever number of requests over it. Or you can open few connections at once and send requests in parallel.

ttaylorr commented 6 years ago

Moreover, you don't have to establish a new HTTPS connection for each lock request [ ... ]

Right, though I'm not sure if we're doing this in practice. I think that this is a good topic for a first-time contributor to look into enabling, if it isn't already.

Calinou commented 5 years ago

If you're using SSH with Git, consider setting up SSH multiplexing to decrease connection times (this will make the initial handshake be performed only once by host). GitHub and GitLab should support this fine.

xmedeko commented 10 months ago

One advice: when using SSH for main git URL, switch to HTTPS otherwise git-lfs tries to use git-lfs-authenticate to get credentials for HTTPS and it may slow down the init of each LFS command. (It's also possible to use lfs.url / remote.<remote>.lfsurl configs to overwrite it for LFS only.)

I have a very small GitHub repo (5 files, 2 dirs), a fair speed of net connection, (ping github.com avg 35ms) and both lock and unlocks are slow (on Windows 11). The tests are when locking 1 or 3 files at once (e.g. git lock file1 file2 file3).

When using SSH for the main repo

operation 1 file 3 files
lock 3.1 sec 3.7 sec
unlock 5.5 sec 6.6 sec

When using HTTPS for the main repo

operation 1 file 3 files
lock 1.5 sec 2.0 sec
unlock 1.8 sec 2.9 sec

Note: Unlock is slower, for SSH main repo especially. Even if I consider that git-lfs-authenticate may slow down the init, it does not explain why unlock with SSH main repo is so much slower and the difference increases with more files on command line.

See also #5510 #2616

zhmyh1337 commented 10 months ago

@xmedeko thanks for your reply. was just setting up an LFS with locks repo for a large project and by some miracle stumbled upon your post left after 3 years xD. maybe you saved us hours. it's pretty sad that this is still an issue in 2023. especially when we have had locks for quite a while

bk2204 commented 10 months ago

The extra round-trip time for the SSH option is because by default, Git LFS tries the new pure SSH protocol before falling back to the older hybrid protocol. SSH is also, just generally, a protocol that requires more round-trips than HTTPS, and thus it's going to be slower.

I've put in #5555 to allow people to disable the pure SSH protocol if they want to speed this up a bit, and I'll get it green sometime in the next few days. That shaves off more than a second for me when locking and unlocking.

It's true that locking multiple files at once can be slow because until recently, you had to lock them one by one with a separate command for each one. We can probably implement a little optimization to improve the performance, though. I'm looking to see what I can do to remove a bunch of the duplication.

bk2204 commented 10 months ago

I've made some performance improvements in #5561. I expect these will be more noticeable for larger numbers of files and on Windows, where spawning a process is more expensive.