Closed deanpcmad closed 4 years ago
I tested this on mac OS in a lowest power MacBook air, loading rails spent 3500ms ~ 4200ms. I think it's enough for v1.1.
@lunny could you please give alpinelinux aports a try?
Then try to browse the main directory.
It takes couple of minutes on a powerful server mainly because of git log....
I remember this issue has also been reported on gogs before but was never taken care of. Some have suggested to use a caching system. A simpler approach would be to fetch a directory list (like github does) and if needed a-sync fetch commit messages via javascript. Cgit just only shows the directory list which is pretty fast (if possible add this as an option to disable fetching of commit messages, if thats possible with current implementation).
I will try it. @clandmeter
Which page do you want to test? @clandmeter In my machine, main page is 1763ms and first release page is 6662ms .
@lunny can you check the main directory like this one at github:
@lunny btw, im using 1.0.1 i believe the performance commits for tags page has landed after the 1.0.1, or in another branch.
@lunny I think https://github.com/go-gitea/gitea/issues/502 is related?
@clandmeter Yes, I tested in master. I think v1.0.1 maybe slower than master. Yes. it's related with #502
@lunny I tried master today both on Linux (Alpine Linux) and win10. Both crash at startup so i cannot verify if its faster.
Where is the crash log?
C:\Users\carlo\Desktop\gitea>gitea.exe web
2017/01/20 12:47:02 [W] Custom config 'C:/Users/carlo/Desktop/gitea/custom/conf/app.ini' not found, ignore this if you're running first time
2017/01/20 12:47:02 [T] Custom path: C:/Users/carlo/Desktop/gitea/custom
2017/01/20 12:47:02 [T] Log path: C:/Users/carlo/Desktop/gitea/log
2017/01/20 12:47:02 [I] Gitea v1.0.0+137-g1610b9f
2017/01/20 12:47:02 [I] Log Mode: Console(Trace)
2017/01/20 12:47:02 [I] Cache Service Enabled
2017/01/20 12:47:02 [I] Session Service Enabled
2017/01/20 12:47:02 [I] SQLite3 Supported
2017/01/20 12:47:02 [I] Run Mode: Development
panic: Macaron handler must be a callable function
goroutine 1 [running]:
panic(0xeec4e0, 0xc0434b7400)
/usr/local/go/src/runtime/panic.go:500 +0x1af
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.validateHandler(0xeec4e0, 0xc0434b73e0)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:50 +0xbf
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.validateHandlers(0xc0434bfc80, 0x6, 0x8)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:58 +0x54
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Handle(0xc04200f360, 0x100f580, 0x4, 0xc0434c4160, 0x1b, 0xc0434c2f90, 0x6, 0x8, 0x0)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:176 +0x417
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Post(0xc04200f360, 0x1011b19, 0x6, 0xc0434c2f90, 0x3, 0x3, 0x10)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:210 +0x7c
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Post-fm(0x1011b19, 0x6, 0xc0434c2f90, 0x3, 0x3, 0x3)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:335 +0x63
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*ComboRouter).route(0xc0434bd100, 0xc0434a6bb8, 0x100f580, 0x4, 0xc0434a6cc8, 0x3, 0x3, 0xeec4e0)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:322 +0x12e
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*ComboRouter).Post(0xc0434bd100, 0xc0434a6cc8, 0x3, 0x3, 0xc0434b73e0)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:335 +0x99
code.gitea.io/gitea/routers/api/v1.RegisterRoutes.func1.6()
/srv/app/src/code.gitea.io/gitea/routers/api/v1/api.go:409 +0x4d9
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Group(0xc04200f360, 0x1020d48, 0xe, 0xc0434a6f58, 0xc0434b70c0, 0x1, 0x1)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:190 +0x112
code.gitea.io/gitea/routers/api/v1.RegisterRoutes.func1()
/srv/app/src/code.gitea.io/gitea/routers/api/v1/api.go:417 +0xc42
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Group(0xc04200f360, 0x100e74a, 0x3, 0xc0434a71a0, 0xc04348cfc0, 0x1, 0x1)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:190 +0x112
code.gitea.io/gitea/routers/api/v1.RegisterRoutes(0xc0422c6580)
/srv/app/src/code.gitea.io/gitea/routers/api/v1/api.go:450 +0xdf
code.gitea.io/gitea/cmd.runWeb.func17()
/srv/app/src/code.gitea.io/gitea/cmd/web.go:609 +0x31
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Group(0xc04200f360, 0x100f074, 0x4, 0xc0434a74a8, 0xc04348cfb0, 0x1, 0x1)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:190 +0x112
code.gitea.io/gitea/cmd.runWeb(0xc042184140, 0x0, 0xc042184100)
/srv/app/src/code.gitea.io/gitea/cmd/web.go:610 +0x1506
code.gitea.io/gitea/vendor/github.com/urfave/cli.HandleAction(0xf07f80, 0x113f1c8, 0xc042184140, 0xc0421a2200, 0x0)
/srv/app/src/code.gitea.io/gitea/vendor/github.com/urfave/cli/app.go:471 +0xc0
code.gitea.io/gitea/vendor/github.com/urfave/cli.Command.Run(0x100edc8, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10318a9, 0x16, 0x0, ...)
/srv/app/src/code.gitea.io/gitea/vendor/github.com/urfave/cli/command.go:191 +0xcce
code.gitea.io/gitea/vendor/github.com/urfave/cli.(*App).Run(0xc04246e340, 0xc04203e3a0, 0x2, 0x2, 0x0, 0x0)
/srv/app/src/code.gitea.io/gitea/vendor/github.com/urfave/cli/app.go:241 +0x6aa
main.main()
/srv/app/src/code.gitea.io/gitea/main.go:39 +0x35b
I am stopped by the same panic message as @clandmeter (I don't know if it is the same issue, I was trying to update my Gitea installation - running on Docker)
bash-4.3$ /app/gitea/gitea web
2017/01/20 11:54:36 [T] Custom path: /data/gitea
2017/01/20 11:54:36 [T] Log path: /data/gitea/log
panic: Macaron handler must be a callable function
goroutine 1 [running]:
panic(0x7ffa36d24140, 0xc42152b720)
/usr/lib/go/src/runtime/panic.go:500 +0x1a5
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.validateHandler(0x7ffa36d24140, 0xc42152b700)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:50 +0xba
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.validateHandlers(0xc421555400, 0x6, 0x8)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:58 +0x4f
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Handle(0xc4205cc460, 0x7ffa3661ce90, 0x4, 0xc42155e7c0, 0x1b, 0xc421568300, 0x6, 0x8, 0x7ffa37b93020)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:176 +0x412
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Post(0xc4205cc460, 0x7ffa3661f447, 0x6, 0xc421568300, 0x3, 0x3, 0x10)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:210 +0x77
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Post-fm(0x7ffa3661f447, 0x6, 0xc421568300, 0x3, 0x3, 0x3)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:335 +0x5e
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*ComboRouter).route(0xc4215430c0, 0xc4214d6bb8, 0x7ffa3661ce90, 0x4, 0xc4214d6cc8, 0x3, 0x3, 0x7ffa36d24140)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:322 +0x129
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*ComboRouter).Post(0xc4215430c0, 0xc4214d6cc8, 0x3, 0x3, 0xc42152b700)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:335 +0x94
code.gitea.io/gitea/routers/api/v1.RegisterRoutes.func1.6()
/srv/app/src/code.gitea.io/gitea/routers/api/v1/api.go:409 +0x4d4
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Group(0xc4205cc460, 0x7ffa3662e69d, 0xe, 0xc4214d6f58, 0xc42152b3e0, 0x1, 0x1)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:190 +0x10d
code.gitea.io/gitea/routers/api/v1.RegisterRoutes.func1()
/srv/app/src/code.gitea.io/gitea/routers/api/v1/api.go:417 +0xc3d
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Group(0xc4205cc460, 0x7ffa3661c13f, 0x3, 0xc4214d71a0, 0xc4214912e0, 0x1, 0x1)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:190 +0x10d
code.gitea.io/gitea/routers/api/v1.RegisterRoutes(0xc420473980)
/srv/app/src/code.gitea.io/gitea/routers/api/v1/api.go:450 +0xda
code.gitea.io/gitea/cmd.runWeb.func17()
/srv/app/src/code.gitea.io/gitea/cmd/web.go:609 +0x2c
code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*Router).Group(0xc4205cc460, 0x7ffa3661c9d4, 0x4, 0xc4214d74a8, 0xc4214912d0, 0x1, 0x1)
/srv/app/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:190 +0x10d
code.gitea.io/gitea/cmd.runWeb(0xc4201c17c0, 0x0, 0xc4201c1700)
/srv/app/src/code.gitea.io/gitea/cmd/web.go:610 +0x1501
code.gitea.io/gitea/vendor/github.com/urfave/cli.HandleAction(0x7ffa36d3fcc0, 0x7ffa36e476e8, 0xc4201c17c0, 0xc420058d00, 0x0)
/srv/app/src/code.gitea.io/gitea/vendor/github.com/urfave/cli/app.go:471 +0xbb
code.gitea.io/gitea/vendor/github.com/urfave/cli.Command.Run(0x7ffa3661c730, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7ffa3663ebbf, 0x16, 0x0, ...)
/srv/app/src/code.gitea.io/gitea/vendor/github.com/urfave/cli/command.go:191 +0xcc9
code.gitea.io/gitea/vendor/github.com/urfave/cli.(*App).Run(0xc42024b520, 0xc42000c140, 0x2, 0x2, 0x0, 0x0)
/srv/app/src/code.gitea.io/gitea/vendor/github.com/urfave/cli/app.go:241 +0x6a5
main.main()
/srv/app/src/code.gitea.io/gitea/main.go:39 +0x356
resolved by #708
@lunny seems master branch is working again so I did some small tests:
Yes. This issue should be fixed by #570 .
move this to v1.2 since #570 has been moved.
@lunny any progress in this area?
Im still getting very slow loads on large directory contents:
Gitea Version: d545e32 Page: 418155ms Template: 11903ms
https://try.gitea.io/clandmeter/aports/src/branch/master/community
Would it be possible to have a pager or disable the loading of commit history?
For github it will only show the first 1000 files.
See https://try.gitea.io/joshfng/gitlab-ce, it spent about 13 seconds.
When creating a pull requests it takes ~12 seconds to show a single commit after selecting the branches:
Gitea Version: 1.4.1 Page: 12296ms Template: 9468ms
The repository has around 5k commits and the repo home page loads in 4 seconds.
I don't think the number of commits is really the bottleneck. I have a repo with a size of around 2 GB and 500 commits, and it's already taking 5 seconds to load. Maybe the git
commands that are ran when showing a repo need to be optimized or their results cached.
There was problem with commit count before but I fixed that with adding cache for commit count so that should not be problem anymore. I think the current problem is with many files in view that slowing down last commit info calculation I think
The problem i am facing is when directories contain many objects. For each object a git command is executed which is rather expensive regarding cpu/io. For instance https://github.com/alpinelinux/aports/tree/master/main will load extremely slow (if load at all) because git wants to fetch the latest info for each object. A simple approach to solve this is to make a setting to disable fetching of git information if the object count is larger than x.
isn't there a way to make gitlab style (I believe I saw that there) to make ajax request per each file/directory and update information asynchronously ?
Could this be helping for that issue? https://blogs.msdn.microsoft.com/devops/2018/06/25/supercharging-the-git-commit-graph/
Maybe we could calc file's last commit asynchrony?
@lunny before there is a real solution to this problem could you add an option to disable commit info in listings so gitea doesn't spawn git cmd?
@clandmeter that could be a temporary solution.
@lunny that would be great. I would love to test some Alpine Linux related things with gitea but our repo is just too large to make it work atm.
I think rewriting this functionality to use go-git library it would greatly improve performance
@lafriks wrote:
There was problem with commit count before but I fixed that with adding cache for commit count so that should not be problem anymore. I think the current problem is with many files in view that slowing down last commit info calculation I think
The bottleneck is caused due to the huge number of git-list-rev and git-cat-file calls for larger repos. Caching the output, or the rendered HTML in the macaron cache (which maps to redis or memcached), or as static file might help.
Pre-rendering HTML at git-receive or git-update time might be another option (to avoid slow rendering of first request).
I think rewriting this functionality to use go-git library it would greatly improve performance
you would still need to walk the git tree on disk, and collect the git-list-rev information; would this get any faster?
Could this be helping for that issue? https://blogs.msdn.microsoft.com/devops/2018/06/25/supercharging-the-git-commit-graph/
I gave it a try, but the improvements were quite negligable. On a Windows host the time to execute the git command to list latest commit for one file/directory takes slightly less than a second and I believe the overhead is not only the repository access. We have roughly 50 directories in a repository and the listing takes 25 - 50 seconds. I updated the storage on the machine to SSD with higher throughput and got about 30% boost and more consistent times, but it still takes 20 seconds to load the repository page.
UPDATE: The serialized commit graph helps only a bit. It reduces some I/O especially when hitting old history stored in pack files. However the tree objects still have to be loaded anyway, which dominates the time in the end.
@lafriks go-git
is no silver bullet either, but there's a potential to improve the load times with it.
I implemented a simple Go program to list the root of a repository using go-git and for each entry find the last commit. Timing it on my test repository yields a similar result to whatever Gitea does now, but there are few things to note:
git
module makes me believe that there is some parallelization involved, albeit little. I didn't do any parallelization at all in my test code.go-git
, which simply couldn't be achieved by the git
command line today. The trick is to process all the files at once while walking the commit history. Now the history is loaded many times over and the same trees are loaded and examined for each file.go-git
seems to be too eager to read way too much data (as evidenced by timeit
tool on Windows when comparing simple log queries to git log -1 <file name>
).I have never written in Go before, so if someone wants to improve upon my measly attempt you are more than welcome. I'd be especially interested if someone could do the implementation of walking the history only once and processing more files at the same time (and stopping once we know the commits for all the files).
https://gist.github.com/filipnavara/8e6fdf980130d6ca120bfda4c25481e9
I updated the Gist with some naïve multi-file processing. Now I get 4s times on my repo, which is about 4x faster than the baseline. Worst-case with walking the whole history of the entire repo using go-git
is around 30s.
UPDATE: Using KeepDescriptors
option to prevent go-git
from reopening pack files all the time slashes another 0.5s from the time (or 12% if you prefer).
UPDATE 2: Trying some performance optimizations at https://github.com/filipnavara/go-git/tree/perf-read. I'm now at ±2.6s on the tests. There was small gain (±0.25s) by avoiding reader.Seek(0, io.SeekCurrent)
when reading packfiles and the offset was already known. Another problem with my code is that it accesses most commits twice, which caused them to be actually read twice from the disk for non-packfile objects. Lastly, there was a huge gain by using the in-memory packfile indexes to lookup commit hashes instead of looking into objects directory, if the indexes were already loaded. I still see quite weird and erratic reads on the packfiles itself, but I wasn't able to figure out what causes it.
UPDATE 3: I found the bottleneck when reading packfile objects and implemented a workaround. Now I am at 1.37s, or about 90% faster than my Gitea listing on the same machine. Profiler shows that it's only around 30% I/O bound now, so any further optimization will need someone with more Go experience.
I'll try to upstream my performance improvements to go-git
, but as far as Gitea goes I would really appreciate any help.
Proof of concept:
gitea
and git
module to handle some read-only operations using go-git
.go-git
with performance related fixesCurrent status: Page: 1688ms Template: 28ms
on the top-level listing vs Page: 16898ms Template: 14736ms
with latest Gitea release. No git
command at all is invoked for loading directory listings.
Btw, the algorithm I implemented is the same one used by libgit2sharp
and on the basic conceptual level similar to what git
does today. It is not necessarily efficient when deep history has to be traversed (eg. looking at a directory that was not changed for long time relatively to the rest of the repository). It is easy to detect that case and impose some limits on the history traversed to maintain more consistent performance at the expense of not showing all the commit information.
There's ongoing work to speed that up using further improvements on top of the commit graph feature - https://blogs.msdn.microsoft.com/devops/2018/07/16/super-charging-the-git-commit-graph-iv-bloom-filters/ - but it's not even finalized in git itself by now.
@filipnavara Thanks for go-git.
I have locally implemented support for the Git 2.18+ serialized commit graphs in go-git
. As expected the performance benefits for that alone are not worth the additional complexity. However, adding the bloom filter optimization makes real wonders when looking into repository directories that weren't changed for quite a while. It could easily bring another 10x speed-up for that use cases at expense of up-front calculations (± 10 minutes for 30000 revisions in non-optimized code) and storage (640 additional bytes per revision in addition to Git commit graph).
@filipnavara I looked through your changes in gitea and must say they look amazing, great work! :) I will look more into this when we have got 1.6.0 out of the doors
@lafriks Thanks much! I found some flaws in the commit_info.go
implementation for getLastCommitForPaths
which I still plan to fix, but it would be great to get some of the changes upstream. I'll try to help as much as my time permits.
I commited fix for reporting the revisions across a more complicated commit graphs and across merges.
My experiments with using serialized commit graphs is tracked at https://github.com/src-d/go-git/issues/965. The Gitea counter-part is at https://github.com/filipnavara/go-git/tree/commitgraph. The code is NOT production ready, does NOT handle errors correctly and most importantly leaks file handles at the moment. I am only sharing it to show what further performance improvements are achiveable. There's a tool for generating the precomputed commit graphs in the go-git branch under _examples/commit-graph. It generates commit graph information in Git 2.18+ compatible format with the addition of optional path filter data. This tool is SLOW and is meant to be run only once in a while (eg. after repository import or along with git gc
). It is possible to update this index incrementally, but it is not currently implemented. The precomputed information is only used for commits where it is available, otherwise standard Git objects are used. With the precomputed information I am getting sub-second page loads now for every directory listing in our repository, even if it contains paths changed 7+ years ago for which a lot of data would have to be read without the optimizations.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.
Oh my @filipnavara Awesome job on the performance improvements. I pulled your perf-read
branch and built it.
I have a repo with 25,000 files in one folder. The previous gitea web ui would take > 3 hours to load, but with your branch it loaded in 19s!
I would love to see this change make it into the master line.
@dfredell Unfortunately I am busy and don't have time to upstream it. However I do update the branch every now and then to track upstream changes. Once #6364 gets merged I will do it again and probably open a PR to start the discussion.
https://github.com/go-gitea/gitea/pull/6364 was merged yesterday! 😄
I have a repo with a folder with more than 2000 files. This takes ~25 seconds to load (not production site), of which 24 seconds are spent in getLastCommitForPaths
(run from recent Gitea master branch).
In addition to any performance improvements possible, maybe a new option could be introduced to display only file names (without latest commit info) if a folder contains more than x entries (folders and files). That way very big folders can still be shown quickly but if you want to see commit details/history you need to enter the specific file.
@davidsvantesson You can speed it up a bit by building commit-graph file (git commit-graph write
). I would be interested in how much it helps for your repository.
@filipnavara That is very interesting, but I do not see any change in performance for listing repo files in Gitea. Maybe Gitea doesn't run operations where it benefits from it?
Edit: That is strange, because I have the code of #7314, but doesn't seem to improve my performance. I will do some more investigation into it.
Possibly linked/related to #490
I like to keep some mirrors of popular projects such as Rails on my Gitea server however whenever I go to view that repo, it can take 10 seconds plus (sometimes causing an nginx 502 timeout error) to load the page