eclipse-jgit / jgit

JGit, the Java implementation of git
https://www.eclipse.org/jgit/
Other
117 stars 33 forks source link

JGit is very slow in marking the remote refs as advertised #5

Open dluksza opened 8 months ago

dluksza commented 8 months ago

Version

5.13

Operating System

MacOS

Bug description

Bugzilla bug

When clone a remote repository that advertised a large number of refs (e.g. in the order of millions) the JGit client spend a lot of time marking the received refs as locally advertised.

See the full stack-trace below:
    at java.lang.Throwable.fillInStackTrace(Native Method)
    at java.lang.Throwable.fillInStackTrace(Throwable.java:784)
    - locked <0x00000007a69a1728> (a java.io.FileNotFoundException)
    at java.lang.Throwable.<init>(Throwable.java:266)
    at java.lang.Exception.<init>(Exception.java:66)
    at java.io.IOException.<init>(IOException.java:58)
    at java.io.FileNotFoundException.<init>(FileNotFoundException.java:77)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at org.eclipse.jgit.internal.storage.file.LooseObjects.getObjectLoader(LooseObjects.java:186)
    at org.eclipse.jgit.internal.storage.file.LooseObjects.open(LooseObjects.java:149)
    at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openLooseObject(ObjectDirectory.java:396)
    at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openLooseFromSelfOrAlternate(ObjectDirectory.java:373)
    at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObjectWithoutRestoring(ObjectDirectory.java:349)
    at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:330)
    at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:132)
    at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:212)
    at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:1075)
    at org.eclipse.jgit.transport.BasePackFetchConnection.markAdvertised(BasePackFetchConnection.java:987)
    at org.eclipse.jgit.transport.BasePackFetchConnection.markRefsAdvertised(BasePackFetchConnection.java:979)
    at org.eclipse.jgit.transport.BasePackFetchConnection.doFetch(BasePackFetchConnection.java:363)
    at org.eclipse.jgit.transport.TransportHttp$SmartHttpFetchConnection.doFetch(TransportHttp.java:1550)
    at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:302)
    at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:293)
    at org.eclipse.jgit.transport.FetchProcess.fetchObjects(FetchProcess.java:274)
    at org.eclipse.jgit.transport.FetchProcess.executeImp(FetchProcess.java:171)
    at org.eclipse.jgit.transport.FetchProcess.execute(FetchProcess.java:94)
    at org.eclipse.jgit.transport.Transport.fetch(Transport.java:1309)
    at org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:213)
    at org.eclipse.jgit.api.CloneCommand.fetch(CloneCommand.java:311)
    at org.eclipse.jgit.api.CloneCommand.call(CloneCommand.java:182)

Actual behavior

Clone operation is slow

Expected behavior

Speed improvement in clone operation

Relevant log output

No response

Other information

No response

msohn commented 8 months ago

Did you try if this also happens when using JGit from current master branch ?

dluksza commented 8 months ago

Not yet, for now, was just browsing the code and it looks like although the stacktrace doesn't match fully it should behave similarly. Will test it on the master now.

dluksza commented 8 months ago

Yes, it is still the case, here's the stacktrace from master:

Thread [main] (Suspended (breakpoint at line 212 in LooseObjects))  
    LooseObjects.getObjectLoader(WindowCursor, File, AnyObjectId) line: 212 
    LooseObjects.open(WindowCursor, AnyObjectId) line: 171  
    ObjectDirectory.openLooseObject(WindowCursor, AnyObjectId) line: 418    
    ObjectDirectory.openLooseFromSelfOrAlternate(WindowCursor, AnyObjectId, Set<Id>) line: 394  
    ObjectDirectory.openObjectWithoutRestoring(WindowCursor, AnyObjectId) line: 369 
    ObjectDirectory.openObject(WindowCursor, AnyObjectId) line: 350 
    WindowCursor.open(AnyObjectId, int) line: 133   
    WindowCursor(ObjectReader).open(AnyObjectId) line: 216  
    RevWalk.parseAny(AnyObjectId) line: 1119    
    TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).markAdvertised(AnyObjectId) line: 1101  
    TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).markRefsAdvertised() line: 1093 
    TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).doFetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>, OutputStream) line: 408    
    TransportHttp$SmartHttpFetchConnection.doFetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>, OutputStream) line: 1565    
    TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).fetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>, OutputStream) line: 351  
    TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).fetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>) line: 343    
    FetchProcess.fetchObjects(ProgressMonitor) line: 290    
    FetchProcess.executeImp(ProgressMonitor, FetchResult, String) line: 182 
    FetchProcess.execute(ProgressMonitor, FetchResult, String) line: 105    
    TransportHttp(Transport).fetch(ProgressMonitor, Collection<RefSpec>, String) line: 1482 
    FetchCommand.call() line: 238   
    CloneCommand.fetch(Repository, URIish) line: 319    
    CloneCommand.call() line: 189   
    Clone.run() line: 131   
    Clone(TextBuiltin).execute(String[]) line: 239  
    Main.execute(String[]) line: 247    
    Main.run(String[]) line: 135    
    Main.main(String[]) line: 106   

For each advertised object we'll do a try-catch and re-throw the FileNotFoundException as you can tell from the LooseObjects.getObjectLoader() source code.

Would it be possible to have a different implementation of ObjectDirectory only for the clone operation? We can safely assume during the clone that we don't have any of the objects. WDYT?

msohn commented 6 months ago

I think using a different ObjectDirectory implementation would be overkill, it looks like this could be fixed by skipping the call to #openLooseFromSelfOrAlternate(WindowCursor, AnyObjectId, Set<AlternateHandle.Id>) in #openObjectWithoutRestoring(WindowCursor, AnyObjectId) [1] if we know we are executing a clone. We already have a few other such optimizations for clone.

[1] https://github.com/eclipse-jgit/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/ObjectDirectory.java#L357