High Idle CPU usage (fresh install) #15542

Closed elmodor closed 3 years ago

Gitea version (or commit ref): 1.14.1 (rootless)
Git version: 2.30.2
Operating system: Debian 10, amd64, Intel Celeron N4100 (quad-core 1.1 GHz, up to 2.4 GHz)
Docker: Docker 19.03.13, Compose: 1.27.4
Database (use [x]):
- [x] PostgreSQL
- [ ] MySQL
- [ ] MSSQL
- [ ] SQLite
Can you reproduce the bug at https://try.gitea.io:
- [ ] Yes (provide example URL)
- [x] No
Log gist:

Description

I have a high idle CPU usage with my upgraded gitea server, so I tried a fresh deployment and have the same issue. After a fresh install (even 1 hour after) the CPU% usage is about 5% of my system. One user, no repository. In my testing VM (also Debian 10, 1 CPU thread only) the idle CPU% usage is about 0,1% (with the exact same docker-compose.yml and configuration). This happens with 14.1 root or rootless version. (haven't tried others)

I know 5% is not that high, but should it really be that much with a fresh install and while idle?

According to htop the CPU is used by /usr/local/bin/gitea -c /etc/gitea/app.ini web. I've set the logging to debug but there's no logs when being idle.

Any help is appreciated :)

Screenshots

gitea1 gitea2

I'm seeing the same behaviour on my machine.

I have activated the access log to make sure no one is accessing my Gitea instance and triggering something. It is definitely idle load.

On Log Level "Info" I can see a log like this every 10 seconds (:05, :15, :25, ...):

2021/04/26 17:45:55 ...dels/notification.go:734:GetUIDsAndNotificationCounts() [I] [SQL] SELECT user_id, count(*) AS count FROM notification WHERE user_id IN (SELECT user_id FROM notification WHERE updated_unix >= ? AND updated_unix < ?) AND status = ? GROUP BY user_id [1619459143 1619459153 1] - 577.264µs

Can you confirm that?

Cheers, Tinxx

I am also noticing a CPU usage that seems high for gitea being idle with no usage at all. On my server that also hosts samba shares that are used every day, and other standard services like chrony, rsyslogd, crond, and firewalld(a python daemon!), gitea had the highest TIME value on my server after 24 days, by a large amount. I also see the SQL query lines every 10 seconds, but I don't think that is the issue, since that only runs every 10 seconds, and I see the cpu usage spikes seem to be pretty constant, though it jumps between threads. The parent process will be at 1% or 2% CPU usage according to htop with the refresh set to 1 second. With htop's refresh set to much shorter, like 0.1 seconds then it jumps from 0% to 8% back and forth constantly.

What may be more telling is the strace output, which shows a constant polling of epoll_pwait() on several threads:

[pid 944513] 14:39:28.794308 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.794348 epoll_pwait(3, <unfinished ...> [pid 944513] 14:39:28.794389 <... epoll_pwait resumed>[], 128, 0, NULL, 824674766514) = 0 [pid 944547] 14:39:28.794424 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 [pid 944513] 14:39:28.794460 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.794498 futex(0xc00280e150, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 944514] 14:39:28.797018 <... nanosleep resumed>NULL) = 0 [pid 944514] 14:39:28.797112 futex(0x595ee58, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=16761648} <unfinished ...> [pid 944513] 14:39:28.813747 <... epoll_pwait resumed>[], 128, 19, NULL, 19) = 0 [pid 944513] 14:39:28.813841 epoll_pwait(3, [], 128, 0, NULL, 0) = 0 [pid 944513] 14:39:28.813953 epoll_pwait(3, <unfinished ...> [pid 944514] 14:39:28.814087 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out) [pid 944514] 14:39:28.814158 nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...> [pid 944513] 14:39:28.815166 <... epoll_pwait resumed>[], 128, 1, NULL, 19) = 0 [pid 944513] 14:39:28.815263 futex(0xc00280e150, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 944547] 14:39:28.815395 <... futex resumed>) = 0 [pid 944513] 14:39:28.815425 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.815470 epoll_pwait(3, <unfinished ...> [pid 944513] 14:39:28.815510 <... epoll_pwait resumed>[], 128, 0, NULL, 824765049992) = 0 [pid 944547] 14:39:28.815546 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 [pid 944513] 14:39:28.815583 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.815622 futex(0xc00280e150, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 944514] 14:39:28.824461 <... nanosleep resumed>NULL) = 0 [pid 944514] 14:39:28.824553 futex(0x595ee58, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=2534876} <unfinished ...> [pid 944513] 14:39:28.826860 <... epoll_pwait resumed>[], 128, 11, NULL, 19) = 0 [pid 944513] 14:39:28.826973 epoll_pwait(3, [], 128, 0, NULL, 0) = 0 [pid 944513] 14:39:28.827089 epoll_pwait(3, <unfinished ...> [pid 944514] 14:39:28.827298 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out) [pid 944514] 14:39:28.827369 nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...> [pid 944513] 14:39:28.828289 <... epoll_pwait resumed>[], 128, 1, NULL, 19) = 0 [pid 944513] 14:39:28.828379 epoll_pwait(3, [], 128, 0, NULL, 0) = 0 [pid 944513] 14:39:28.828495 futex(0xc00280e150, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 944547] 14:39:28.828617 <... futex resumed>) = 0 [pid 944513] 14:39:28.828646 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.828686 epoll_pwait(3, <unfinished ...> [pid 944513] 14:39:28.828727 <... epoll_pwait resumed>[], 128, 0, NULL, 824707085640) = 0 [pid 944547] 14:39:28.828761 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 [pid 944513] 14:39:28.828797 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.828836 futex(0xc00280e150, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 944514] 14:39:28.837662 <... nanosleep resumed>NULL) = 0 [pid 944514] 14:39:28.837754 futex(0x595ee58, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=28499131} <unfinished ...> [pid 944513] 14:39:28.866087 <... epoll_pwait resumed>[], 128, 37, NULL, 19) = 0 [pid 944513] 14:39:28.866181 epoll_pwait(3, [], 128, 0, NULL, 0) = 0 [pid 944513] 14:39:28.866293 epoll_pwait(3, <unfinished ...> [pid 944514] 14:39:28.866464 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out) [pid 944514] 14:39:28.866536 nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...> [pid 944513] 14:39:28.867492 <... epoll_pwait resumed>[], 128, 1, NULL, 19) = 0 [pid 944513] 14:39:28.867585 futex(0xc00280e150, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 944547] 14:39:28.867717 <... futex resumed>) = 0 [pid 944513] 14:39:28.867747 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.867794 epoll_pwait(3, <unfinished ...> [pid 944513] 14:39:28.867827 <... epoll_pwait resumed>[], 128, 0, NULL, 824697945778) = 0 [pid 944547] 14:39:28.867862 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 [pid 944513] 14:39:28.867917 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.867952 futex(0xc00280e150, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 944514] 14:39:28.876822 <... nanosleep resumed>NULL) = 0 [pid 944514] 14:39:28.876919 futex(0x595ee58, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=10181233} <unfinished ...> [pid 944513] 14:39:28.887213 <... epoll_pwait resumed>[], 128, 19, NULL, 19) = 0 [pid 944513] 14:39:28.887311 epoll_pwait(3, <unfinished ...> [pid 944514] 14:39:28.887354 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out) [pid 944513] 14:39:28.887400 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 [pid 944514] 14:39:28.887437 nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...> [pid 944513] 14:39:28.887495 futex(0xc00280e150, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 944547] 14:39:28.887566 <... futex resumed>) = 0 [pid 944513] 14:39:28.887595 <... futex resumed>) = 1 [pid 944547] 14:39:28.887628 futex(0xc000680950, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 944530] 14:39:28.887740 <... futex resumed>) = 0 [pid 944547] 14:39:28.887768 epoll_pwait(3, <unfinished ...> [pid 944513] 14:39:28.887802 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.887848 <... epoll_pwait resumed>[], 128, 0, NULL, 824672440224) = 0 [pid 944530] 14:39:28.887877 epoll_pwait(3, <unfinished ...> [pid 944513] 14:39:28.887924 <... epoll_pwait resumed>[], 128, 0, NULL, 824701657488) = 0 [pid 944547] 14:39:28.887963 epoll_pwait(3, <unfinished ...> [pid 944530] 14:39:28.887993 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 [pid 944513] 14:39:28.888028 futex(0x595f6d0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 944530] 14:39:28.888070 futex(0xc000680950, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 944547] 14:39:28.894220 <... epoll_pwait resumed>[], 128, 6, NULL, 19) = 0 [pid 944547] 14:39:28.894315 epoll_pwait(3, [], 128, 0, NULL, 0) = 0 [pid 944547] 14:39:28.894437 futex(0xc000680950, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 944530] 14:39:28.894563 <... futex resumed>) = 0 [pid 944547] 14:39:28.894599 epoll_pwait(3, <unfinished ...> [pid 944530] 14:39:28.894644 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.894677 <... epoll_pwait resumed>[], 128, 0, NULL, 824677068466) = 0 [pid 944530] 14:39:28.894714 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 [pid 944547] 14:39:28.894749 epoll_pwait(3, <unfinished ...> [pid 944530] 14:39:28.894789 futex(0xc000680950, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 944514] 14:39:28.897652 <... nanosleep resumed>NULL) = 0 [pid 944514] 14:39:28.897739 futex(0x595ee58, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=17650309} <unfinished ...> [pid 944547] 14:39:28.915023 <... epoll_pwait resumed>[], 128, 20, NULL, 19) = 0 [pid 944547] 14:39:28.915115 epoll_pwait(3, [], 128, 0, NULL, 0) = 0 [pid 944547] 14:39:28.915219 epoll_pwait(3, <unfinished ...> [pid 944514] 14:39:28.915536 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out) [pid 944514] 14:39:28.915603 nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...> [pid 944547] 14:39:28.916414 <... epoll_pwait resumed>[], 128, 1, NULL, 19) = 0 [pid 944547] 14:39:28.916505 epoll_pwait(3, [], 128, 0, NULL, 0) = 0 [pid 944547] 14:39:28.916609 futex(0xc000680950, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 944547] 14:39:28.916735 epoll_pwait(3, <unfinished ...> [pid 944530] 14:39:28.916770 <... futex resumed>) = 0 [pid 944547] 14:39:28.916805 <... epoll_pwait resumed>[], 128, 0, NULL, 824707085960) = 0 [pid 944530] 14:39:28.916841 epoll_pwait(3, <unfinished ...> [pid 944547] 14:39:28.916880 epoll_pwait(3, <unfinished ...>

There are various scheduled tasks that happen periodically, what would be most helpful is information from pprof (specifically the diagram it provides) as then CPU usage could be traced throughout the codebase.

big surprise...

pprof001.svg.pdf

i guess strace doesn't lie. i just wasted over an hour of my time for that stupid pprof that strace already clearly showed us.

@ddan39 if you had given us the out file instead of the svg we could have looked at what was causing the poll waits.

You may find changing your queue configurations may change the number of polling.

In terms of the GetUIDsAndNotificationCounts this is simply the eventsource - if you do not want it to run turn it off. (see https://docs.gitea.io/en-us/config-cheat-sheet/#ui---notification-uinotification)

Ah, yeah, It was getting late and I was getting a bit frustrated trying to get pprof to even work when I have zero go knowledge. Sorry about responding a bit harshly. I was going to upload more files(well to be honest, I was trying to upload svg first, which was denied) but github was only letting me attach certain file types to the comments. I probably should've just zipped them both together... it looks like .zip files are allowed. When I get off work I will attach some more files.

I will look into the queue settings, thanks.

I was surprised to see gitea seemingly polling the epoll_wait function so fast like that to be honest. With go being all about concurrency I figured it could just do a long blocking call... but again, I'm a newb here.

I got pprof and strace running inside the docker of gitea:

docker exec -u root -it gitea22 /bin/bash
apk add strace go graphviz
export GOPATH=/
go get -u github.com/google/pprof
strace/pprof /usr/local/bin/gitea -c /etc/gitea/app.ini web

But whenever I try to kill the running gitea process to try to start it with pprof or strace I get booted out of the container. How do I run this inside the docker container?

edit: removed bad info about pprof, easy to use it below

to get profile, simply add to your app.ini [server] section ENABLE_PPROF = true and then after gitea has been running for a while run the command go tool pprof -top ./path/to/gitea-bin http://127.0.0.1:6060/

Gitea has pprof http endpoints built in.

See https://docs.gitea.io/en-us/config-cheat-sheet/#server-server

well, shit.

So thanks to Etzelia in Discord I got pprof to run. Heres a short guide on how to do it: https://github.com/go-gitea/gitea/issues/14772

Here is my output of 10min, this is a fresh rootless docker install. No repo, one user. Docker stats showed around 4-5% CPU usage of my host system (with a 1sec refresh rate of docker stats):

(pprof) top  
Showing nodes accounting for 12960ms, 76.10% of 17030ms total  
Dropped 130 nodes (cum <= 85.15ms)  
Showing top 10 nodes out of 64
      flat  flat%   sum%        cum   cum%
    7080ms 41.57% 41.57%     7080ms 41.57%  runtime.epollwait
    2820ms 16.56% 58.13%     2820ms 16.56%  runtime.futex
     920ms  5.40% 63.53%    12680ms 74.46%  runtime.findrunnable
     610ms  3.58% 67.12%      830ms  4.87%  runtime.nanotime (inline)
     480ms  2.82% 69.94%     7560ms 44.39%  runtime.netpoll
     220ms  1.29% 71.23%     1030ms  6.05%  runtime.checkTimers
     220ms  1.29% 72.52%      220ms  1.29%  runtime.nanotime1
     220ms  1.29% 73.81%      310ms  1.82%  runtime.scanobject
     200ms  1.17% 74.99%      230ms  1.35%  runtime.pidleget
     190ms  1.12% 76.10%      190ms  1.12%  runtime.(*mcache).prepareForSweep

and second run

(pprof) top
Showing nodes accounting for 5270ms, 69.07% of 7630ms total
Dropped 86 nodes (cum <= 38.15ms)
Showing top 10 nodes out of 87
      flat  flat%   sum%        cum   cum%
    1940ms 25.43% 25.43%     1940ms 25.43%  runtime.epollwait
    1880ms 24.64% 50.07%     1880ms 24.64%  runtime.futex
     330ms  4.33% 54.39%     4400ms 57.67%  runtime.findrunnable
     290ms  3.80% 58.19%      500ms  6.55%  runtime.selectgo
     210ms  2.75% 60.94%      310ms  4.06%  runtime.scanobject
     150ms  1.97% 62.91%      220ms  2.88%  runtime.nanotime (inline)
     130ms  1.70% 64.61%      130ms  1.70%  runtime.heapBitsSetType
     120ms  1.57% 66.19%     2060ms 27.00%  runtime.netpoll
     120ms  1.57% 67.76%      140ms  1.83%  runtime.sellock
     100ms  1.31% 69.07%      150ms  1.97%  runtime.lock2

I attached the pprof output. Loadable with: go tool pprof pprof.out

pprof1.pdf pprof2.pdf pprofs.zip

@jolheiser Can you do anything with those pprofs?

I mean these are pretty low-level go runtime issues to do with selects and polling. I've put up a PR that might(?) help if it's the case that it's due to having too many go-routines waiting on selects that are to blame.

Could you try building #15686 to see if this helps?

Yeah I fully understand that this might not be an easy fix and not one that has a high priority.

I've build the rootless docker based on your PR changes @zeripath but sadly I did not see any changes on the idle CPU usage on a fresh install. (apk add takes ages inside the docker build container...). Still around 5% after booting it up, configure it and creating one user.

If it helps I attached another pprof (2x10min) of the docker container running #15686

(pprof) top
Showing nodes accounting for 4320ms, 65.65% of 6580ms total
Dropped 117 nodes (cum <= 32.90ms)
Showing top 10 nodes out of 98
      flat  flat%   sum%        cum   cum%
    1760ms 26.75% 26.75%     1760ms 26.75%  runtime.epollwait
    1480ms 22.49% 49.24%     1480ms 22.49%  runtime.futex
     250ms  3.80% 53.04%     3810ms 57.90%  runtime.findrunnable
     210ms  3.19% 56.23%      370ms  5.62%  runtime.scanobject
     120ms  1.82% 58.05%      570ms  8.66%  runtime.checkTimers
     120ms  1.82% 59.88%      290ms  4.41%  runtime.selectgo
     100ms  1.52% 61.40%      350ms  5.32%  runtime.mallocgc
     100ms  1.52% 62.92%      180ms  2.74%  runtime.nanotime (inline)
      90ms  1.37% 64.29%     1850ms 28.12%  runtime.netpoll
      90ms  1.37% 65.65%       90ms  1.37%  runtime.nextFreeFast (inline)

(pprof) top
Showing nodes accounting for 4520ms, 62.78% of 7200ms total
Dropped 100 nodes (cum <= 36ms)
Showing top 10 nodes out of 94
      flat  flat%   sum%        cum   cum%
    1750ms 24.31% 24.31%     1750ms 24.31%  runtime.epollwait
    1580ms 21.94% 46.25%     1580ms 21.94%  runtime.futex
     300ms  4.17% 50.42%     4000ms 55.56%  runtime.findrunnable
     230ms  3.19% 53.61%      360ms  5.00%  runtime.scanobject
     150ms  2.08% 55.69%     1900ms 26.39%  runtime.netpoll
     130ms  1.81% 57.50%      200ms  2.78%  runtime.nanotime (inline)
     120ms  1.67% 59.17%      240ms  3.33%  runtime.selectgo
      90ms  1.25% 60.42%      610ms  8.47%  runtime.checkTimers
      90ms  1.25% 61.67%       90ms  1.25%  runtime.nextFreeFast (inline)
      80ms  1.11% 62.78%      290ms  4.03%  runtime.mallocgc

pprof.gitea.samples.cpu.pb.zip

OK,

So I'm really not certain what to do here - we could be chasing a ghost that is fundamentally not fixable but I think it's likely that the above pprofs and straces aren't really capturing what the real problem is - simply because by measuring they're causing wakeups.

My suspicion is that the wakeups are caused by the level queues work loop checking if there is work for it to do - it will currently do this every 100ms and there are more than a few queues that all wake up and check.

So just to prove that set your app.ini:

...

[queue]
TYPE=channel

[queue.issue_indexer]
TYPE=channel

(Remove/change any other [queue.*] sections as appropriate - they should all be TYPE=channel)

Then just check with top as well as pprof or strace. (There are few potential other things that could be causing frequent wake ups too like potentially the DB connector. Does changing the DB type reduce this?)

Now - the presumption when this code was made was that this is a minimal and insignificant potential issue. As this is the third report, clearly that presumption is incorrect.

So what can be done?

Well... the majority of the queues are actually using the persistable-queue type - which means that once the level queue is empty it should never get any more work. So any further polling in this case is actually unnecessary - but there are significant concurrency edgecases that mean asserting when that further polling can stop is actually hard.

However, a leveldb only can have only one process open it at a time so... we could realistically check the length of the queue and if it is 0 block waiting for a push that will cause us to have some work to do. The trouble is getting the concurrency correct here and handling stopping properly.

For redis queues though I'm not certain what we can do about polling.

In all both cases 100ms was chosen as a "reasonable" default fall back polling time rather doing some more complicated exponential backoff as balanced between responsiveness and backend load.

Hi @zeripath,

I don't have time to look into pprof (this is a whole new topic for me) but setting the queue type to channel has a significant impact on the CPU usage on my small server.

When the idle CPU usage of Gitea was around 6% before the change it is now down to around 3%.

Should I reset the queue type after testing or what is the implication on setting it to channel? If the default is persistable-channel that sounds like queues would survive Gitea restarts.

Cheers, Tinxx

Here's another PR that will cause level db queues to simply wait on empty instead of polling at the cost of another select

@tinxx as I say above I am suspicious that pprof may be a red-herring. Could you try the #15693 PR to see if that reduces the default baseline load to similar to that for type=channel?

@zeripath I have managed to build your branch wait-on-empty and ran it on my server.

After initial setup with one user the load oscillates around 2.5%.

I have ran a fresh release version and created a new user, etc. to find the CPU load oscillating around 6% again.

Hey @zeripath I appreciate the help! The new PR seems to have reduced the CPU usage from 5% to 3%-4% maybe? However, the queue change got it from 5% to 1%-2%.:

[queue]
TYPE = channel

[queue.*]
TYPE = channel

[queue.issue_indexer]
TYPE = channel

I will be running pprofs and be posting them later.

OK so I think now it's worth checking if the combination: https://github.com/zeripath/gitea/tree/prs-15693-15686-15696 helps

So https://github.com/zeripath/gitea/tree/prs-15693-15686-15696 has a CPU usage of around 2%-3% in my case (without queue set to channel). This is better than the previous PRs :) However, with that PR and queue's set to channel it's 1%-2%.

Do you still need any pprofs from the previous or this PR / queue channels?

top from #15693 (pprof.gitea.samples.cpu.002.pb.gz)

(pprof) top
Showing nodes accounting for 4680ms, 65.18% of 7180ms total
Dropped 118 nodes (cum <= 35.90ms)
Showing top 10 nodes out of 96
      flat  flat%   sum%        cum   cum%
    1830ms 25.49% 25.49%     1830ms 25.49%  runtime.epollwait
    1710ms 23.82% 49.30%     1710ms 23.82%  runtime.futex
     310ms  4.32% 53.62%     4450ms 61.98%  runtime.findrunnable
     190ms  2.65% 56.27%      190ms  2.65%  runtime.(*mcache).prepareForSweep
     180ms  2.51% 58.77%      380ms  5.29%  runtime.scanobject
     140ms  1.95% 60.72%      220ms  3.06%  runtime.nanotime (inline)
      80ms  1.11% 61.84%       80ms  1.11%  runtime.findObject
      80ms  1.11% 62.95%      110ms  1.53%  runtime.heapBitsSetType
      80ms  1.11% 64.07%       80ms  1.11%  runtime.nanotime1
      80ms  1.11% 65.18%     1920ms 26.74%  runtime.netpoll

top from queue set to channel + #15693 (pprof.gitea.samples.cpu.001.pb.gz)

(pprof) top
Showing nodes accounting for 2880ms, 68.09% of 4230ms total
Dropped 65 nodes (cum <= 21.15ms)
Showing top 10 nodes out of 85
      flat  flat%   sum%        cum   cum%
    1070ms 25.30% 25.30%     1070ms 25.30%  runtime.futex
     910ms 21.51% 46.81%      910ms 21.51%  runtime.epollwait
     240ms  5.67% 52.48%     2470ms 58.39%  runtime.findrunnable
     220ms  5.20% 57.68%      380ms  8.98%  runtime.scanobject
      90ms  2.13% 59.81%     1020ms 24.11%  runtime.netpoll
      80ms  1.89% 61.70%      120ms  2.84%  runtime.nanotime (inline)
      70ms  1.65% 63.36%      100ms  2.36%  runtime.findObject
      70ms  1.65% 65.01%      200ms  4.73%  runtime.mallocgc
      70ms  1.65% 66.67%      200ms  4.73%  runtime.selectgo
      60ms  1.42% 68.09%       60ms  1.42%  runtime.runqget

top from prs-15693-15686-15696 (pprof.gitea.samples.cpu.003.pb.gz):

(pprof) top
Showing nodes accounting for 4420ms, 67.90% of 6510ms total
Dropped 101 nodes (cum <= 32.55ms)
Showing top 10 nodes out of 86
      flat  flat%   sum%        cum   cum%
    1820ms 27.96% 27.96%     1820ms 27.96%  runtime.epollwait
    1570ms 24.12% 52.07%     1570ms 24.12%  runtime.futex
     200ms  3.07% 55.15%     3870ms 59.45%  runtime.findrunnable
     150ms  2.30% 57.45%      230ms  3.53%  runtime.nanotime (inline)
     140ms  2.15% 59.60%      140ms  2.15%  runtime.madvise
     140ms  2.15% 61.75%      340ms  5.22%  runtime.scanobject
     130ms  2.00% 63.75%     1970ms 30.26%  runtime.netpoll
     110ms  1.69% 65.44%      130ms  2.00%  runtime.findObject
      80ms  1.23% 66.67%      570ms  8.76%  runtime.checkTimers
      80ms  1.23% 67.90%       80ms  1.23%  runtime.nanotime1

top from prs-15693-15686-15696 + queue set to channel (pprof.gitea.samples.cpu.004.pb.gz):

(pprof) top
Showing nodes accounting for 2770ms, 73.67% of 3760ms total
Dropped 53 nodes (cum <= 18.80ms)
Showing top 10 nodes out of 104
      flat  flat%   sum%        cum   cum%
    1300ms 34.57% 34.57%     1300ms 34.57%  runtime.epollwait
     750ms 19.95% 54.52%      750ms 19.95%  runtime.futex
     210ms  5.59% 60.11%      370ms  9.84%  runtime.scanobject
     140ms  3.72% 63.83%     2380ms 63.30%  runtime.findrunnable
     100ms  2.66% 66.49%      100ms  2.66%  runtime.findObject
      60ms  1.60% 68.09%       60ms  1.60%  runtime.madvise
      60ms  1.60% 69.68%       90ms  2.39%  runtime.nanotime (inline)
      50ms  1.33% 71.01%       60ms  1.60%  runtime.(*mcache).prepareForSweep
      50ms  1.33% 72.34%     1390ms 36.97%  runtime.netpoll
      50ms  1.33% 73.67%       50ms  1.33%  runtime.write1

pprof.gitea.samples.cpu.pbs.zip

OK I've just pushed another little change up to that branch and #15696 that would allow:

[queue]
WORKERS=0
BOOST_WORKERS=1

[queue.issue_indexer]
WORKERS=0
BOOST_WORKERS=1

(Which could still be changed to TYPE=channel as necessary.)

The point is that this will not have worker go-routines running unless they're actually needed.

BTW This [queue.*] in your app.ini where does it come from? It doesn't mean or do anything.

sorry for the delay, i saw someone else had already posted their profile so figured there was no rush to get mine, but i've attached it anyways, and the pprof top output is further down. this is actually the first time i've gotten back to my home PC since my last comment.

gitea.prof.zip

i have tried the code from prs-15693-15686-15696, the profile is attached and pprof top looks like: pprof.gitea.samples.cpu.005.pb.gz

File: gitea Build ID: f765ff3bcf5b25bdfd8831c602c0de66f0ef4b57 Type: cpu Time: May 1, 2021 at 5:20pm (EDT) Duration: 30s, Total samples = 110ms ( 0.37%) Showing nodes accounting for 110ms, 100% of 110ms total flat flat% sum% cum cum% 40ms 36.36% 36.36% 40ms 36.36% runtime.epollwait 20ms 18.18% 54.55% 20ms 18.18% context.(*cancelCtx).Done 20ms 18.18% 72.73% 20ms 18.18% runtime.futex 10ms 9.09% 81.82% 10ms 9.09% runtime.mallocgc 10ms 9.09% 90.91% 30ms 27.27% runtime.notesleep 10ms 9.09% 100% 10ms 9.09% runtime.wirep

and in case it helps, here is the head of my original profile top output:

File: gitea Build ID: 950109434956ed68d7c7e0741a6ff3e8586f7990 Type: cpu Time: Apr 27, 2021 at 12:40am (EDT) Duration: 13.91mins, Total samples = 5.08s ( 0.61%) Showing nodes accounting for 4.45s, 87.60% of 5.08s total Dropped 173 nodes (cum <= 0.03s) flat flat% sum% cum cum% 1.15s 22.64% 22.64% 1.15s 22.64% runtime.epollwait 0.87s 17.13% 39.76% 0.87s 17.13% runtime.futex 0.20s 3.94% 43.70% 2.97s 58.46% runtime.findrunnable 0.12s 2.36% 46.06% 0.25s 4.92% runtime.selectgo 0.11s 2.17% 48.23% 0.18s 3.54% runtime.scanobject 0.10s 1.97% 50.20% 0.39s 7.68% runtime.checkTimers 0.09s 1.77% 51.97% 0.31s 6.10% runtime.mallocgc 0.09s 1.77% 53.74% 0.09s 1.77% runtime.nextFreeFast (inline) 0.07s 1.38% 55.12% 0.07s 1.38% runtime.(*mcache).prepareForSweep 0.07s 1.38% 56.50% 0.11s 2.17% runtime.nanotime (partial-inline) 0.07s 1.38% 57.87% 1.22s 24.02% runtime.netpoll 0.07s 1.38% 59.25% 0.62s 12.20% runtime.notesleep 0.07s 1.38% 60.63% 3.41s 67.13% runtime.schedule 0.06s 1.18% 61.81% 0.07s 1.38% runtime.lock2

Running the code from https://github.com/zeripath/gitea/commit/b1f6a0cfd3a6976a867270aaf4d0f86372205b8f

with the 2 queue settings set to WORKERS=0 and BOOST_WORKERS=1 File: gitea Build ID: dc8e3f5cdde5399ead75d4998c4c670eeef426bd Type: cpu Time: May 1, 2021 at 5:39pm (EDT) Duration: 30s, Total samples = 130ms ( 0.43%) Showing nodes accounting for 130ms, 100% of 130ms total flat flat% sum% cum cum% 30ms 23.08% 23.08% 30ms 23.08% runtime.epollwait 10ms 7.69% 30.77% 10ms 7.69% github.com/syndtr/goleveldb/leveldb.(version).walkOverlapping 10ms 7.69% 38.46% 10ms 7.69% runtime.(randomOrder).start (inline) 10ms 7.69% 46.15% 10ms 7.69% runtime.findObject 10ms 7.69% 53.85% 90ms 69.23% runtime.findrunnable 10ms 7.69% 61.54% 10ms 7.69% runtime.futex 10ms 7.69% 69.23% 10ms 7.69% runtime.ifaceeq 10ms 7.69% 76.92% 10ms 7.69% runtime.lock2 10ms 7.69% 84.62% 10ms 7.69% runtime.nanotime1 10ms 7.69% 92.31% 10ms 7.69% runtime.nobarrierWakeTime (inline) 10ms 7.69% 100% 20ms 15.38% runtime.scanobject 0 0% 100% 20ms 15.38% code.gitea.io/gitea/modules/queue.(ByteFIFOQueue).readToChan 0 0% 100% 10ms 7.69% code.gitea.io/gitea/modules/queue.(LevelQueueByteFIFO).Pop 0 0% 100% 10ms 7.69% code.gitea.io/gitea/modules/queue.(LevelUniqueQueueByteFIFO).Pop 0 0% 100% 20ms 15.38% gitea.com/lunny/levelqueue.(Queue).RPop 0 0% 100% 10ms 7.69% gitea.com/lunny/levelqueue.(UniqueQueue).RPop 0 0% 100% 20ms 15.38% github.com/syndtr/goleveldb/leveldb.(DB).Get 0 0% 100% 20ms 15.38% github.com/syndtr/goleveldb/leveldb.(DB).get 0 0% 100% 10ms 7.69% github.com/syndtr/goleveldb/leveldb.(version).get 0 0% 100% 10ms 7.69% github.com/syndtr/goleveldb/leveldb.memGet 0 0% 100% 10ms 7.69% runtime.futexsleep 0 0% 100% 20ms 15.38% runtime.gcBgMarkWorker 0 0% 100% 20ms 15.38% runtime.gcBgMarkWorker.func2 0 0% 100% 20ms 15.38% runtime.gcDrain 0 0% 100% 10ms 7.69% runtime.lock (inline) 0 0% 100% 10ms 7.69% runtime.lockWithRank (inline) 0 0% 100% 10ms 7.69% runtime.mPark 0 0% 100% 90ms 69.23% runtime.mcall 0 0% 100% 10ms 7.69% runtime.nanotime (inline) 0 0% 100% 30ms 23.08% runtime.netpoll 0 0% 100% 10ms 7.69% runtime.notesleep 0 0% 100% 90ms 69.23% runtime.park_m 0 0% 100% 90ms 69.23% runtime.schedule 0 0% 100% 10ms 7.69% runtime.stopm 0 0% 100% 20ms 15.38% runtime.systemstack

with default config:

File: gitea Build ID: dc8e3f5cdde5399ead75d4998c4c670eeef426bd Type: cpu Time: May 1, 2021 at 5:34pm (EDT) Duration: 30s, Total samples = 410ms ( 1.37%) Showing nodes accounting for 410ms, 100% of 410ms total flat flat% sum% cum cum% 150ms 36.59% 36.59% 150ms 36.59% runtime.epollwait 60ms 14.63% 51.22% 60ms 14.63% runtime.futex 20ms 4.88% 56.10% 290ms 70.73% runtime.findrunnable 20ms 4.88% 60.98% 20ms 4.88% runtime.nobarrierWakeTime (inline) 20ms 4.88% 65.85% 40ms 9.76% runtime.scanobject 10ms 2.44% 68.29% 20ms 4.88% github.com/syndtr/goleveldb/leveldb/memdb.(DB).Find 10ms 2.44% 70.73% 10ms 2.44% github.com/syndtr/goleveldb/leveldb/memdb.(DB).findGE 10ms 2.44% 73.17% 10ms 2.44% runtime.(mheap).allocSpan 10ms 2.44% 75.61% 10ms 2.44% runtime.(mspan).markBitsForIndex (inline) 10ms 2.44% 78.05% 10ms 2.44% runtime.acquirep 10ms 2.44% 80.49% 10ms 2.44% runtime.checkTimers 10ms 2.44% 82.93% 10ms 2.44% runtime.findObject 10ms 2.44% 85.37% 10ms 2.44% runtime.lock2 10ms 2.44% 87.80% 70ms 17.07% runtime.mPark 10ms 2.44% 90.24% 10ms 2.44% runtime.madvise 10ms 2.44% 92.68% 20ms 4.88% runtime.mallocgc 10ms 2.44% 95.12% 310ms 75.61% runtime.mcall 10ms 2.44% 97.56% 10ms 2.44% runtime.selectgo 10ms 2.44% 100% 90ms 21.95% runtime.stopm 0 0% 100% 30ms 7.32% code.gitea.io/gitea/modules/queue.(ByteFIFOQueue).readToChan 0 0% 100% 30ms 7.32% code.gitea.io/gitea/modules/queue.(LevelQueueByteFIFO).Pop 0 0% 100% 20ms 4.88% code.gitea.io/gitea/modules/queue.(WorkerPool).addWorkers.func1 0 0% 100% 20ms 4.88% code.gitea.io/gitea/modules/queue.(WorkerPool).doWork

I was hoping that [queue.*] would set this for all queue.* settings as well. Are there any other queue settings? queue.task?

I think we (well mostly @zeripath ;) ) is making some good progress here. Latest prs-15693-15686-15696 with:

[queue]
TYPE=channel
[queue.issue_indexer]
TYPE=channel

it is around 2%-3%

[queue]
WORKERS=0
BOOST_WORKERS=1
[queue.issue_indexer]
WORKERS=0
BOOST_WORKERS=1

it is around 0.5%-1% pprof from this:

(pprof) top
Showing nodes accounting for 280ms, 96.55% of 290ms total
Showing top 10 nodes out of 71
      flat  flat%   sum%        cum   cum%
     120ms 41.38% 41.38%      120ms 41.38%  runtime.epollwait
      80ms 27.59% 68.97%       80ms 27.59%  runtime.futex
      10ms  3.45% 72.41%       10ms  3.45%  github.com/syndtr/goleveldb/leveldb.(*DB).get
      10ms  3.45% 75.86%       10ms  3.45%  runtime.acquirep
      10ms  3.45% 79.31%      210ms 72.41%  runtime.findrunnable
      10ms  3.45% 82.76%       10ms  3.45%  runtime.memclrNoHeapPointers
      10ms  3.45% 86.21%       10ms  3.45%  runtime.nanotime1
      10ms  3.45% 89.66%       10ms  3.45%  runtime.ready
      10ms  3.45% 93.10%       10ms  3.45%  runtime.unlock2
      10ms  3.45% 96.55%       10ms  3.45%  syscall.Syscall

and with

[queue]
TYPE=channel
WORKERS=0
BOOST_WORKERS=1
[queue.issue_indexer]
TYPE=channel
WORKERS=0
BOOST_WORKERS=1

it is steady 0% ! pprof from this:

(pprof) top
Showing nodes accounting for 10ms, 100% of 10ms total
Showing top 10 nodes out of 38
      flat  flat%   sum%        cum   cum%
      10ms   100%   100%       10ms   100%  runtime.futex
         0     0%   100%       10ms   100%  bufio.(*Reader).Read
         0     0%   100%       10ms   100%  code.gitea.io/gitea/models.GetUIDsAndNotificationCounts
         0     0%   100%       10ms   100%  code.gitea.io/gitea/modules/eventsource.(*Manager).Run
         0     0%   100%       10ms   100%  code.gitea.io/gitea/modules/graceful.(*Manager).RunWithShutdownContext
         0     0%   100%       10ms   100%  database/sql.(*DB).QueryContext
         0     0%   100%       10ms   100%  database/sql.(*DB).query
         0     0%   100%       10ms   100%  database/sql.(*DB).queryDC
         0     0%   100%       10ms   100%  database/sql.(*DB).queryDC.func1
         0     0%   100%       10ms   100%  database/sql.ctxDriverQuery

i see that too when setting

[queue]
TYPE=channel
WORKERS=0
BOOST_WORKERS=1
[queue.issue_indexer]
TYPE=channel
WORKERS=0
BOOST_WORKERS=1

i get near constant 0% cpu usage at idle. i can also see with strace that the constant rapid calling of epoll_pwait() and futex() no longer happen. i only see a group of calls like every 10 seconds that are pretty minimal.

are there any possible side-effects of using these settings?

[queue] is doing the work what you think [queue.*] does. It's just that the issue indexer is annoying and has its own forced default.

So I think we can stop with the pprofs just now.

It looks like the are two fundamental issue for your idle CPU usage:

The worker check loop on level queues.
The number of goroutines.

15696 & #15693 reduce around half of the work of point 1. but if that is still not enough then you will have to set `TYPE=channel`.

With the three PRs together, using channel queues - which does have the cost of a potential loss of data on shutdown at times of load - you should just flush the queues before you shutdown - and the below config you should be able to get gitea down to 30 goroutines when absolutely idle.

[queue]
TYPE=channel
WORKERS=0
BOOST_WORKERS=1

[queue.issue_indexer]
TYPE=channel
WORKERS=0
BOOST_WORKERS=1

I think there may be a way of getting the persistable channel queue to shutdown its runner once it's empty so I will continue to see if that can be improved. Another option is to see if during shutdown we can flush the channel queues to reduce the risk of leaving something in them.

I'm not otherwise certain if I can reduce the number of basal goroutines further but it looks like that's enough.

There's likely some Linux resource setting you can increase that would allow go to avoid futex cycling with more goroutines but I'm no expert here and don't know.

I've let it run for a bit longer and computed the avg. CPU usage of the docker container in idle which was around 0.14% CPU usage average (with all three queue settings and all three PRs). That's already lots better than the 5% we started with before.

Loss of data would obviously not be good.

However, when running the flush command I get this:

gitea manager flush-queues
readObjectStart: expect { or n, but found F, error found in #1 byte of ...|Forbidden
|..., bigger context ...|Forbidden
|...

This sounds like an error? I've run this as gitea and root inside the rootless container - both the same.

Thanks for you patient and awesome work so far!

I suspect your manager call needs you to set your config option correctly.

Indeed, that fixed it. It was quite late yesterday...

Anyway, is there any more testing/feedback that you require or is this the maximum of reduceable goroutins for now? (not complaining, I'm already quite happy that you were able to reduce it this much) Not sure if the flush could be done automatically on a docker shutdown/stop

I've changed the persistable-channel queues to shutdown their level dbs once they're empty or not run them at all if they're empty which brings the number of baseline goroutines on three pr branch with default configuration to 80, and 74 with WORKERS=0, BOOST_WORKERS=1 above.

I'll have a think about what we can do about reducing these further.

With the current state of prs-15693-15686-15696 76e34f05fbcc3d0069e708a440f882cf4da77a01

and app.ini

[queue]
; TYPE=channel
WORKERS=0
BOOST_WORKERS=1

[queue.issue_indexer]
; TYPE=channel
WORKERS=0
BOOST_WORKERS=1

The starting number of go-routines is down to 64-65 ish.

I can think of potentially one more small improvement in the queues infrastructure (just use contexts instead of channels) but it may not be worth it. It may not be possible to reduce the number of goroutines in the persistable channel further though. I still need to figure out some way of improving polling to reduce baseline load with the RedisQueue - likely this will be with Redis Signals but I'm not sure.

In terms of other baseline go-routines - there may be a few other places where the changing the CancelFunc trick may apply.

Down to 59 now - we can stop looking at prs-15693-15686-15696 and just look at #15693 as I've moved #15686 into it with 23 on the channel only variants.

The next way to reduce the number of goroutines for persistable channels is to use a common queue instead.

Awesome work! I tested the prs one earlier today but I will test the new one tomorrow. I also think that there are some DB calls because the only time gitea (with channels) has >0% CPU now is at times where the DB (postgres) also has a CPU spike.

So I think if you set:

[queue]
WORKERS=0
BOOST_WORKERS=1
CONN_STR=leveldb://queues/common

[queue.issue_indexer]
WORKERS=0
BOOST_WORKERS=1
CONN_STR=leveldb://queues/common

Then the persistent channel queues will all share a level dB instead of opening their own. That should significantly reduce the number of goroutines and on the current state of that pr probably drop it to around 29 goroutines at rest.

Awesome work! I tested the prs one earlier today but I will test the new one tomorrow. I also think that there are some DB calls because the only time gitea (with channels) has >0% CPU now is at times where the DB (postgres) also has a CPU spike.

That's probably the event source. We currently poll the db even if there's no one connected but I think we can get it to stop when no one is connected.

On 1.14.1, using the below settings

[queue]
TYPE=channel
WORKERS=0
BOOST_WORKERS=1

[queue.issue_indexer]
TYPE=channel
WORKERS=0
BOOST_WORKERS=1

Does reduce the gitea idle CPU to almost 0%, however when I create a new repository and push (using SSH), the webui does not show the repository code (still showing "Quick Guide") but internally the code is pushed to Git successfully. Changing the workers to 1 fix this.

The above config is not meant to work on 1.14.1 and requires changes made on 1.15 and in particular with the PRs I have linked.

These changes will almost certainly not be backported to 1.14.x.

I have tested it with 1.14.1 as well but the CPU spikes are a bit higher. However, if you run this with 1.14.1 you risk data loss and other undefined behavior if I understood the changes correctly which were made in #15693 (and are not present in 1.14.x)

So with #15693 I have using:

[queue]
TYPE=channel
WORKERS=0
BOOST_WORKERS=1

[queue.issue_indexer]
TYPE=channel
WORKERS=0
BOOST_WORKERS=1

I had ~26 goroutines, CPU usage of 0-0.5%,with spikes being usually at the same time as the DB spikes.

And using:

[queue]
WORKERS=0
BOOST_WORKERS=1
CONN_STR=leveldb://queues/common

[queue.issue_indexer]
WORKERS=0
BOOST_WORKERS=1
CONN_STR=leveldb://queues/common

I had ~57 goroutines, CPU usage of 0.2-0.7%

OK looks like the CONN_STR isn't doing the trick of forcing the use of a common leveldb otherwise the goroutines should have been around half that. I'll take a look - but it might be that we have to actually specifically tell the queues to use a common db.

The below is the current configuration for #15693 using common queues:

[queue]
WORKERS=0
BOOST_WORKERS=1

[queue.mail]
DATADIR=queues/common

[queue.notification-service]
DATADIR=queues/common

[queue.pr_patch_checker]
DATADIR=queues/common

[queue.push_update]
DATADIR=queues/common

[queue.repo_stats_update]
DATADIR=queues/common

[queue.task]
DATADIR=queues/common

[queue.issue_indexer]
WORKERS=0
BOOST_WORKERS=1
DATADIR=queues/common

That has a baseline goroutines number of 29.

Those on 1.14.1 who want reductions should find that:

[queue.mail]
DATADIR=queues/common

[queue.notification-service]
DATADIR=queues/common

[queue.pr_patch_checker]
DATADIR=queues/common

[queue.push_update]
DATADIR=queues/common

[queue.repo_stats_update]
DATADIR=queues/common

[queue.task]
DATADIR=queues/common

[queue.issue_indexer]
DATADIR=queues/common

should cause significant reductions in baseline goroutine numbers - although to nowhere near as low levels as #15693 allows.

I had intended to make it possible to force a common leveldb connection by setting CONN_STR appropriately but I suspect handling the people that insist on copying the app.example.ini as their app.ini has caused this to break. I suspect just adding a new option to [queue] could be used to enforce a common leveldb connection.

Yeah, with that configuration for #15693 in docker I got ~32 goroutines and a CPU usage of around 0.05%-0.8%. Since you haven't used the channels in the settings lately I assume using channels should be avoided?

Whether the channel only queue is appropriate for you is a decision about your data-safety and how you start and shutdown your instance. With #15693 the channel queue will at least attempt to flush itself at time of shutdown but... there's still no guarantee that every piece of data in the queue is dealt with.

If you have an absolutely low load - and if you flush before shutting down - then it should be fine to use channel only queues - but as I've shown above it's possible to get it baseline goroutines down to 29 using a leveldb backend which doesn't have this gotcha.

It's up to you really - persistable channels probably have to be the default which is why I've been working to reduce the baseline load there. Other Queue implementations are possible too.

Isn't there a way to reduce the idle load to 0%, possibly giving up some functionality?

Background: I'm running it on my x-slow NAS N40L where I get an idle load of ~3%. This is not acceptable; I'm kicking out any service raising my electricity bill sucking CPU permanently. My current workaround is "systemctl { stop | start } gitea".

Btw; the performance of GITEA is incredible on this slow NAS! Probably because there is tight object code instead of lame web scripts or java bloat.

I am guessing that you're running 1.14.x and not 1.15.x/main.

Please try main.

I think since all the PRs mentioned above (and now explicitly linked to this issue) have been merged we can actually close this issue as the baseline load on an idle gitea is now back to almost zero.

go-gitea / gitea