go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
44.07k stars 5.41k forks source link

Gitea keeps on filling /tmp with indexes #31792

Open jeroenlaylo opened 1 month ago

jeroenlaylo commented 1 month ago

Description

We are running Gitea (1.22.1) on OpenBSD (7.5-current on amd64). For some reason, Gitea keeps writing index files to /tmp, despite having configured different paths in the Gitea config. Since this partition is 3GB, it fills up rather quickly.

What causes this behaviour? Is there a configuration option that we are missing? If it possible to have Gitea use a custom path for these temporary indexes?

Our config:

APP_NAME = Code
RUN_USER = _gitea
RUN_MODE = prod
WORK_PATH = /usr/local/share/gitea

[server]
PROTOCOL = http
DOMAIN = [domain]
ROOT_URL = https://[domain]
HTTP_ADDR = 127.0.0.1
HTTP_PORT = 3000
START_SSH_SERVER = true
BUILTIN_SSH_SERVER_USER = git
SSH_PORT = 2222
SSH_LISTEN_PORT = %(SSH_PORT)s
SSH_SERVER_CIPHERS = chacha20-poly1305@openssh.com,aes256-ctr,aes256-gcm@openssh.com
SSH_SERVER_KEY_EXCHANGES = curve25519-sha256
SSH_SERVER_MACS = hmac-sha2-256-etm@openssh.com
SSH_SERVER_HOST_KEYS = /var/gitea/.ssh/gitea.ed
;, /var/gitea/.ssh/gitea.rsa
APP_DATA_PATH = /var/gitea/data
PPROF_DATA_PATH = /var/gitea/data/tmp/pprof
LFS_JWT_SECRET = [secret]
SSH_DOMAIN = [domain]
DISABLE_SSH = false
LFS_START_SERVER = true
OFFLINE_MODE = true

[database]
DB_TYPE = postgres
LOG_SQL = false
HOST = 127.0.0.1:5432
NAME = gitea
USER = gitea
PASSWD = [secret]
SCHEMA = 
SSL_MODE = disable
CHARSET = utf8

[security]
INSTALL_LOCK = true
SECRET_KEY = 
INTERNAL_TOKEN = [secret]
PASSWORD_HASH_ALGO = pbkdf2

[ssh.minimum_key_sizes]
ED25519 = 256
ECDSA = -1
RSA = 4096
DSA = -1

[camo]

[oauth2]
ENABLE = true
JWT_SIGNING_PRIVATE_KEY_FILE = /var/gitea/jwt/private.pem
JWT_SECRET = [secret]

[log]
ROOT_PATH = /var/log/gitea
MODE = file
LEVEL = Fatal

[git]

[service]
DISABLE_REGISTRATION = false
REQUIRE_SIGNIN_VIEW = false
DISABLE_USERS_PAGE = true
DEFAULT_KEEP_EMAIL_PRIVATE = true
DEFAULT_ALLOW_CREATE_ORGANIZATION = false
DEFAULT_USER_VISIBILITY = private
DEFAULT_ORG_VISIBILITY = private
ROOT = /var/gitea/gitea-repositories
SCRIPT_TYPE = sh
DEFAULT_PRIVATE = private
PREFERRED_LICENSES = BSD-2-Clause,ISC
REGISTER_EMAIL_CONFIRM = false
ENABLE_NOTIFY_MAIL = false
ALLOW_ONLY_EXTERNAL_REGISTRATION = false
ENABLE_CAPTCHA = false
DEFAULT_ENABLE_TIMETRACKING = true
NO_REPLY_ADDRESS = noreply.[domain]

[repository.local]
LOCAL_COPY_PATH = /var/gitea/tmp/local-repo

[repository.upload]
TEMP_PATH = /var/gitea/data/tmp/uploads
FILE_MAX_SIZE = 2048

[cache]
ADAPTER = redis
HOST = redis://127.0.0.1:6379/1?pool_size=100&idle_timeout=180s

[ui]
SHOW_USER_EMAIL = false
DEFAULT_THEME = gitea-dark
THEMES = gitea-dark
ED25519 = 256
ECDSA = 256
RSA = 2047
DSA = -1

[indexer]
ISSUE_INDEXER_TYPE = db
ISSUE_INDEXER_PATH = /var/gitea/indexers/issues.bleve
STARTUP_TIMEOUT = 30s
REPO_INDEXER_ENABLED = false
REPO_INDEXER_PATH = /var/gitea/indexers/repos.bleve
;ISSUE_INDEXER_QUEUE_DIR = /var/gitea/indexers/issues.queue
;REPO_INDEXER_PATH = /var/gitea/indexers/repos.bleve

[admin]
DISABLE_REGULAR_ORG_CREATION = true
ENABLE_OPENID_SIGNIN = false
ENABLE_OPENID_SIGNUP = false

[queue]
TYPE = redis
CONN_STR = redis://127.0.0.1:6379/0
MAX_WORKERS = 4
DATADIR = /var/gitea/queue

[storage]
STORAGE_TYPE = local

[mailer]
ENABLED = true
PROTOCOL = smtp+starttls
SMTP_ADDR = [smtp-server]
SMTP_PORT = 587
FROM = [sender]
SEND_AS_PLAIN_TEXT = true
USER = [user]
PASSWD = [secret]

[session]
PROVIDER = db
COOKIE_SECURE = true

[picture]
AVATAR_UPLOAD_PATH = /var/gitea/data/avatars
REPOSITORY_AVATAR_UPLOAD_PATH = /var/gitea/data/repo-avatars
DISABLE_GRAVATAR = true
ENABLE_FEDERATED_AVATAR = false

[attachment]
MAX_SIZE = 2048
PATH = /var/gitea/data/attachments

[time]
FORMAT = RFC1123Z

[other]
SHOW_FOOTER_VERSION = false
SHOW_FOOTER_TEMPLATE_LOAD_TIME = false
CHUNKED_UPLOAD_PATH = /var/gitea/data/tmp/package-upload

[repository]
ROOT = /var/code/repos

[lfs]
PATH = /var/code/lfs

[repository.pull-request]
DEFAULT_MERGE_STYLE = merge

[repository.signing]
DEFAULT_TRUST_MODEL = committer

[git.timeout]
DEFAULT = 720
MIGRATE = 30000
MIRROR = 72000
CLONE = 30000
PULL = 30000
GC = 60

Gitea Version

1.22.1

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

We are using the OpenBSD package

Database

PostgreSQL

lunny commented 1 month ago

The tmp directory will be used in many operations. Maybe we need a TMP_DIR configuration rather than use system's. I wouldn't say this is a bug but a proposal or enhancement.

wULLSnpAXbWZGYDYyhWTKKspEQoaYxXyhoisqHf commented 1 month ago

I personally don't recall this being the case in the past though (I have also been facing this issues on 1.22.0+dev-686-gee242a08e9), do we know when and what changed this?

jeroenlaylo commented 1 month ago

The tmp directory will be used in many operations. Maybe we need a TMP_DIR configuration rather than use system's. I wouldn't say this is a bug but a proposal or enhancement.

IIRC, this hasn't always been the case. I have been contacted by some users that the Gitea instance was sluggish, upon investigation it turned out that this was due to a full /tmp partition. A rough estimation: it is likely a change introduced in 1.20.x, 1.21.xor1.22.x`.

The TMP_DIR setting/variable would be an ideal addition, yeah.

I personally don't recall this being the case in the past though (I have also been facing this issues on 1.22.0+dev-686-gee242a08e9), do we know when and what changed this?

Thank you kindly for your feedback. This takes some of the doubts away that I was having (whether it wasn't just PEBCAK). Do you experience this on OpenBSD - or on a different OS?

wULLSnpAXbWZGYDYyhWTKKspEQoaYxXyhoisqHf commented 1 month ago

IIRC, this hasn't always been the case. I have been contacted by some users that the Gitea instance was sluggish, upon investigation it turned out that this was due to a full /tmp partition. A rough estimation: it is likely a change introduced in 1.20.x, 1.21.xor1.22.x`.

yeah, I think I would have noticed, been using Gitea for some time now, too.

The TMP_DIR setting/variable would be an ideal addition, yeah.

I concur, making this configurable would be preferable.

I can imagine a decision to speed up the indexer was made at some point, which moved its files to /tmp, since it's mostly ramfs these days.

Thank you kindly for your feedback. This takes some of the doubts away that I was having (whether it wasn't just PEBCAK). Do you experience this on OpenBSD - or on a different OS?

and I am equally glad you opened this issue, because I was having these doubts myself, since I don't get to spend as much time tuning my instance as I used to.

I haven't mentioned this, sorry; I have been running Gitea on Arch (a Linux distro), but I'd expect this behaviour to be largely similar among OSs at least in that it uses system's temp folder.

yp05327 commented 3 weeks ago

TMP_DIR is necessary I think. I have met a similar problem when install Python packages by pip. By default, pip also using /tmp as temp directory, if it is too small, you may get error during the installation. But pip has some env options like TMP_DIR, so it is easy to fix this issue: just use another directory.

jeroenlaylo commented 2 weeks ago

From diving somewhat further, it seems the problem is that cancel() is never called on erroring out - causing leftover temporary files to not be cleaned. Here is a quick patch:

Index: modules/git/repo_index.go
--- modules/git/repo_index.go.orig
+++ modules/git/repo_index.go
@@ -51,7 +51,7 @@ func (repo *Repository) readTreeToIndex(id ObjectID, i

 // ReadTreeToTemporaryIndex reads a treeish to a temporary index file
 func (repo *Repository) ReadTreeToTemporaryIndex(treeish string) (filename, tmpDir string, cancel context.CancelFunc, err error) {
-   tmpDir, err = os.MkdirTemp("", "index")
+   tmpDir, err = os.MkdirTemp("${LOCALSTATEDIR}/gitea/tmp", "index")
    if err != nil {
        return filename, tmpDir, cancel, err
    }
@@ -63,9 +63,16 @@ func (repo *Repository) ReadTreeToTemporaryIndex(treei
            log.Error("failed to remove tmp index file: %v", err)
        }
    }
+
+   // Defer the cancel function to ensure cleanup in case of an error
+   defer func() {
+       if err != nil {
+           cancel()
+       }
+   }()
+
    err = repo.ReadTreeToIndex(treeish, filename)
    if err != nil {
-       defer cancel()
        return "", "", func() {}, err
    }
    return filename, tmpDir, cancel, err

This was only tested on OpenBSD and fixed the behaviour we were seeing. Setting the tmpDir to a custom path is something that fits our scenario better.