go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
45.03k stars 5.49k forks source link

Zombie processes "git-upload-pack" makes Gitea unusable after short period of time #21133

Closed blackandred closed 1 year ago

blackandred commented 2 years ago

Description

Hi, thanks for this fantastic project. Recently I deployed Gitea rootless using Podman from this playbook https://github.com/riotkit-org/core-services .

The instance is quickly becoming unusable, as the number of zombie processes is increasing. After ~3 minutes from restarting the instance I get:

git         1           0           0.173       9m38.459241369s  ?           1s          /usr/local/bin/gitea -c /etc/gitea/app.ini web 
git         49          1           0.000       9m29.45930959s   ?           0s          [git-upload-pack]
git         59          1           0.000       9m27.45934445s   ?           0s          [git-upload-pack]
git         78          1           0.000       9m22.459388431s  ?           0s          [git-upload-pack]
git         97          1           0.000       9m13.459420231s  ?           0s          [git-upload-pack]
git         120         1           0.000       7m55.459449531s  ?           0s          [git-upload-pack]
git         121         1           0.000       7m55.459481392s  ?           0s          [git-upload-pack]
git         124         1           0.000       7m55.459510752s  ?           0s          [git-upload-pack]
git         134         1           0.000       6m14.459542972s  ?           0s          [git-upload-pack]
git         143         1           0.000       6m11.459578793s  ?           0s          [git-upload-pack]
git         162         1           0.000       3m14.459607973s  ?           0s          [git-upload-pack]
git         241         1           0.000       1m54.459642093s  ?           0s          [git-upload-pack]
git         246         1           0.000       1m54.459674664s  ?           0s          [git-upload-pack]
git         248         1           0.000       1m54.459703774s  ?           0s          [git-upload-pack]
git         253         1           0.000       1m54.459735045s  ?           0s          [git-upload-pack]
git         254         1           0.000       1m54.459763314s  ?           0s          [git-upload-pack]
git         263         1           0.000       1m54.459794566s  ?           0s          [git-upload-pack]
git         266         1           0.000       1m54.459823266s  ?           0s          [git-upload-pack]
git         268         1           0.000       1m54.459858046s  ?           0s          [git-upload-pack]
git         279         1           0.000       1m54.459888195s  ?           0s          [git-upload-pack]
git         280         1           0.000       1m54.459917196s  ?           0s          [git-upload-pack]
git         294         1           0.000       14.459946256s    ?           0s          [git-upload-pack]

The processes are appearing when doing git push from local computer.

I see in the log there are EOF's, it may be caused because of my networking which is having sometimes a higher latency.

Gitea Version

1.17.2-rootless

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

https://gist.github.com/blackandred/3593a9b0a73dd913a39860b81f372e20

Screenshots

No response

Git Version

git version 2.36.2

Operating System

Linux 5.4.0-124-generic, Podman 3.4.2

How are you running Gitea?

Database

PostgreSQL

blackandred commented 2 years ago

As a very dirt workaround I created this:

*/5 * * * * /bin/bash -c '[[ "$(podman top gitea | wc -l)" -gt 128 ]] && podman restart gitea'
simbelmas commented 2 years ago

Same issue with Gitea version 1.17.2 built with GNU Make 4.3, go1.18.6 running non root on top of kubernetes. Since i plugged ArgoCD, i see a lot of 'git-upload-pack' processes and application is not reachable by ssh. UI works.

Gitea is built from source in alpine image. Used image: quay.io/simbelmas/gitea-alpine:latest Dockerfile: https://github.com/simbelmas/dockerfiles/blob/latest/gitea-alpine/Dockerfile

cshazi commented 2 years ago

Same issue with gitea/gitea:1.17.2-rootless image on top of AWS EKS.

After a few minutes from starting, a bunch of zombie processes appear:

ec2-user 27152  0.0  0.0      0     0 ?        Z    05:06   0:00 [git-upload-pack] <defunct>
ec2-user 32365  0.0  0.0      0     0 ?        Z    05:12   0:00 [git-upload-pack] <defunct>
ec2-user 32372  0.0  0.0      0     0 ?        Z    05:12   0:00 [git-upload-pack] <defunct>

During the creation of a new zombie process, the following log entries were created:

2022/10/14 04:57:52 ...s/asymkey/ssh_key.go:159:SearchPublicKeyByContent() [I] [6348e9ac-19] [SQL] SELECT "id", "owner_id", "name", "fingerprint", "content", "mode", "type", "login_source_id", "created_unix", "updated_unix", "verified" FROM "public"."public_key" WHERE (content like $1) LIMIT 1 [ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIK0wmN/Cr3JXqmLW7u+g9pTh+wyqDHpSQEIQczXkVx9q%] - 2.016575ms
2022/10/14 04:57:52 models/user/user.go:1011:GetUserByName() [I] [6348ec50] [SQL] SELECT "id", "lower_name", "name", "full_name", "email", "keep_email_private", "email_notifications_preference", "passwd", "passwd_hash_algo", "must_change_password", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "language", "description", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "is_restricted", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "num_teams", "num_members", "visibility", "repo_admin_change_team_access", "diff_view_style", "theme", "keep_activity_private" FROM "public"."user" WHERE "lower_name"=$1 LIMIT 1 [myapp] - 2.012835ms
2022/10/14 04:57:52 ...bce556200f/engine.go:1244:Get() [I] [6348ec50] [SQL] SELECT "id", "owner_id", "owner_name", "lower_name", "name", "description", "website", "original_service_type", "original_url", "default_branch", "num_watches", "num_stars", "num_forks", "num_issues", "num_closed_issues", "num_pulls", "num_closed_pulls", "num_milestones", "num_closed_milestones", "num_projects", "num_closed_projects", "is_private", "is_empty", "is_archived", "is_mirror", "status", "is_fork", "fork_id", "is_template", "template_id", "size", "is_fsck_enabled", "close_issues_via_commit_in_any_branch", "topics", "trust_model", "avatar", "created_unix", "updated_unix" FROM "public"."repository" WHERE "owner_id"=$1 AND "lower_name"=$2 LIMIT 1 [29 myapp-env] - 5.840552ms
2022/10/14 04:57:52 ...s/asymkey/ssh_key.go:144:GetPublicKeyByID() [I] [6348ec50] [SQL] SELECT "id", "owner_id", "name", "fingerprint", "content", "mode", "type", "login_source_id", "created_unix", "updated_unix", "verified" FROM "public"."public_key" WHERE "id"=$1 LIMIT 1 [4] - 1.690692ms
2022/10/14 04:57:52 models/user/user.go:996:GetUserByIDCtx() [I] [6348ec50] [SQL] SELECT "id", "lower_name", "name", "full_name", "email", "keep_email_private", "email_notifications_preference", "passwd", "passwd_hash_algo", "must_change_password", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "language", "description", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "is_restricted", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "num_teams", "num_members", "visibility", "repo_admin_change_team_access", "diff_view_style", "theme", "keep_activity_private" FROM "public"."user" WHERE "id"=$1 LIMIT 1 [7] - 2.115505ms
2022/10/14 04:57:52 ...epo/collaboration.go:85:IsCollaborator() [I] [6348ec50] [SQL] SELECT "id", "repo_id", "user_id", "mode", "created_unix", "updated_unix" FROM "public"."collaboration" WHERE "repo_id"=$1 AND "user_id"=$2 LIMIT 1 [39 7] - 1.715573ms
2022/10/14 04:57:52 ...ls/repo/repo_unit.go:218:getUnitsByRepoID() [I] [6348ec50] [SQL] SELECT "id", "repo_id", "type", "config", "created_unix" FROM "public"."repo_unit" WHERE (repo_id = $1) [39] - 1.797653ms
2022/10/14 04:57:52 [6348ec50] router: completed GET /api/internal/serv/command/4/myapp/myapp-env?mode=1&verb=git-upload-pack for 127.0.0.1:58648, 200 OK in 17.4ms @ private/serv.go:81(private.ServCommand)

If I stop the argocd repo server, no new zombie processes are created.

ArgoCD handles the upload packet operation in a special way: https://github.com/argoproj/argo-cd/blob/master/util/git/workaround.go

bendem commented 2 years ago

I have started seeing this behavior right after switching from root to rootless docker image. Not sure what's up yet.

image

cshazi commented 2 years ago

An example of a zombie process getting stuck:

Debug log:

2022/10/28 04:20:45 modules/ssh/ssh.go:71:sessionHandler() [T] [635b586a-40] SSH: Payload: git-upload-pack '/admin1/test.git'
2022/10/28 04:20:45 modules/ssh/ssh.go:74:sessionHandler() [T] [635b586a-40] SSH: Arguments: [serv key-2 --config=/data/gitea/conf/app.ini]
2022/10/28 04:20:45 [635b589d] router: started   GET /api/internal/serv/command/2/admin1/test?mode=1&verb=git-upload-pack for [::1]:58292
2022/10/28 04:20:45 ...ters/private/serv.go:412:ServCommand() [D] [635b589d] Serv Results:
    IsWiki: false
    DeployKeyID: 0
    KeyID: 2    KeyName: cshazi
    UserName: admin1
    UserID: 2
    OwnerName: admin1
    RepoName: test
    RepoID: 1
2022/10/28 04:20:45 [635b589d] router: completed GET /api/internal/serv/command/2/admin1/test?mode=1&verb=git-upload-pack for [::1]:58292, 200 OK in 2.0ms @ private/serv.go:81(private.ServCommand)

I couldn't find a way to get the control before the cancel context function was executed. It would also be good to be able to give the kill signal when the context is finished, but it is not possible: https://github.com/golang/go/issues/21135 https://github.com/golang/go/issues/22757

In the workaround I found, the context is derived from context.Background() in modules/ssh/ssh.go, and a separate goroutine watch the change of state of the parent context.

Source code: https://github.com/go-gitea/gitea/commit/84714c3a71d9c6a616871d2213543717018090e2

Can you suggest a better solution to the problem?

Exagone313 commented 2 years ago

https://github.com/golang/go/issues/21135 https://github.com/golang/go/issues/22757

By following up the discussions, it seems like a solution has been implemented and is planned to be released in Go 1.20: https://github.com/golang/go/issues/50436 (see commit on https://github.com/golang/go/commit/55eaae452cf69df768b2aaf6045db22d6c1a4029).

bendem commented 1 year ago

We haven't seen this issue for a while. Our last update (to 1.18.2) 3 weeks ago seems to have fixed it:

image

lunny commented 1 year ago

Close as looks like it has been resolved. Please feel free to reopen it if it's still a problem.

bendem commented 1 year ago

For reference, I'm gonna go on a limb and guess this was fixed by #20695