go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
44.08k stars 5.41k forks source link

Maven Deploy Issue - Unique Constraint Error #30171

Closed fwinkelbauer closed 1 week ago

fwinkelbauer commented 5 months ago

Description

We believe that we've found a concurrency bug when doing maven deploys. Depending on network latency or some other factors, mvn deploy may succeed or fail (500 internal server error, see log below). Our project consists of three files:

It seems to us that these files are uploaded in parallel which causes a unique constraint violation in our Postgres database when working with the package_versions table. We couldn't find any maven configuration where we could manipulate maven's upload/deploy behavior, so we currently are in a position where we can't circumvent the problem. We also tried to change Postgres' default isolation level from read_commited to serializable, but doing so causes a different 500 internal server error.

journalctl:

Mar 29 07:25:35 repo-debug gitea[175]: 2024/03/29 07:25:35 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/OurCompany/maven/io/our-company/our-project/module/module-our-project/1.3.0/module-our-project-1.3.0.jar for 10.10.10.1:40002, 201 Created in 9.9ms @ maven/maven.go:230(maven.UploadPackageFile) Mar 29 07:25:35 repo-debug gitea[175]: 2024/03/29 07:25:35 ...packages/packages.go:170:createPackageAndVersion() [E] Error inserting package: pq: duplicate key value violates unique constraint "UQE_package_version_s" Mar 29 07:25:35 repo-debug gitea[175]: 2024/03/29 07:25:35 ...kages/maven/maven.go:51:apiError() [E] pq: duplicate key value violates unique constraint "UQE_package_version_s" Mar 29 07:25:35 repo-debug gitea[175]: 2024/03/29 07:25:35 ...kages/maven/maven.go:54:func17() [E] pq: duplicate key value violates unique constraint "UQE_package_version_s"

After the failed upload, opening the artifact page throws another 500 which is caused by a nil pointer error (see screenshot).

Gitea Version

1.21.10

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

500-blur

Git Version

No response

Operating System

Debian GNU/Linux 12 (bookworm)

How are you running Gitea?

We are using the official binary (manually downloaded) on a self hosted Debian server using systemd

Database

PostgreSQL

fwinkelbauer commented 5 months ago

I can reproduce a different unique constraint error (UQE_package_s) by running:

make test-pgsql\#TestPackageMavenConcurrent

with this test file:

package integration

import (
    "fmt"
    "net/http"
    "strconv"
    "strings"
    "testing"

    "code.gitea.io/gitea/models/unittest"
    user_model "code.gitea.io/gitea/models/user"
    "code.gitea.io/gitea/tests"
)

func TestPackageMavenConcurrent(t *testing.T) {
    defer tests.PrepareTestEnv(t)()

    user := unittest.AssertExistsAndLoadBean(t, &user_model.User{ID: 2})

    groupID := "com.gitea"
    artifactID := "test-project"
    packageVersion := "1.0.1"

    root := fmt.Sprintf("/api/packages/%s/maven/%s/%s", user.Name, strings.ReplaceAll(groupID, ".", "/"), artifactID)

    putFile := func(t *testing.T, path, content string, expectedStatus int) {
        req := NewRequestWithBody(t, "PUT", root+path, strings.NewReader(content)).
            AddBasicAuth(user.Name)
        MakeRequest(t, req, expectedStatus)
    }

    t.Run("Concurrent Upload", func(t *testing.T) {
        defer tests.PrintCurrentTest(t)()

        for i := 0; i < 10; i++ {
            go putFile(t, fmt.Sprintf("/%s/%s.jar", packageVersion, strconv.Itoa(i)), "test", http.StatusCreated)
        }
    })
}

The test is based on tests/integration/api_packages_maven_test.go but only uploads a few files at the same time. My local postgres database was created based on the database preparation documentation.

=== TestPackageMavenConcurrent/Concurrent_Upload (tests/integration/api_packages_fw_test.go:33) --- FAIL: TestPackageMavenConcurrent (7.14s) testlogger.go:60: 2024/04/02 13:25:11 ...xer/stats/indexer.go:87:populateRepoIndexer() [I] Done (re)populating the repo stats indexer with existing repositories testlogger.go:60: 2024/04/02 13:25:12 ...exer/code/indexer.go:312:populateRepoIndexer() [I] Done (re)populating the repo indexer with existing repositories testlogger.go:60: 2024/04/02 13:25:12 ...les/storage/local.go:33:NewLocalStorage() [I] Creating new Local Storage at /Users/fwinkelbauer/Projects/gitea/tests/gitea-lfs-meta --- FAIL: TestPackageMavenConcurrent/Concurrent_Upload (0.04s) testlogger.go:60: 2024/04/02 13:25:12 ...packages/packages.go:138:createPackageAndVersion() [E] Error inserting package: pq: duplicate key value violates unique constraint "UQE_package_s" testlogger.go:60: 2024/04/02 13:25:12 ...kages/maven/maven.go:50:apiError() [E] pq: duplicate key value violates unique constraint "UQE_package_s" testlogger.go:60: 2024/04/02 13:25:12 ...kages/maven/maven.go:53:func17() [E] testlogger.go:60: 2024/04/02 13:25:12 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/2.jar for test-mock:12345, 500 Internal Server Error in 7.0ms @ maven/maven.go:229(maven.UploadPackageFile) api_packages_fw_test.go:29: Error Trace: /Users/fwinkelbauer/Projects/gitea/tests/integration/integration_test.go:402 /Users/fwinkelbauer/Projects/gitea/tests/integration/api_packages_fw_test.go:29 /usr/local/Cellar/go/1.22.1/libexec/src/runtime/asm_amd64.s:1695 Error: Not equal: expected: 201 actual : 500 Test: TestPackageMavenConcurrent/Concurrent_Upload Messages: Request: PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/2.jar testlogger.go:60: 2024/04/02 13:25:12 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/5.jar for test-mock:12345, 201 Created in 13.3ms @ maven/maven.go:229(maven.UploadPackageFile) testlogger.go:60: 2024/04/02 13:25:12 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/1.jar for test-mock:12345, 201 Created in 13.5ms @ maven/maven.go:229(maven.UploadPackageFile) testlogger.go:60: 2024/04/02 13:25:12 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/8.jar for test-mock:12345, 201 Created in 16.5ms @ maven/maven.go:229(maven.UploadPackageFile) testlogger.go:60: 2024/04/02 13:25:12 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/4.jar for test-mock:12345, 201 Created in 16.6ms @ maven/maven.go:229(maven.UploadPackageFile) testlogger.go:60: 2024/04/02 13:25:12 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/7.jar for test-mock:12345, 201 Created in 18.1ms @ maven/maven.go:229(maven.UploadPackageFile) testlogger.go:60: 2024/04/02 13:25:12 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/0.jar for test-mock:12345, 201 Created in 18.3ms @ maven/maven.go:229(maven.UploadPackageFile) testlogger.go:60: 2024/04/02 13:25:12 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/3.jar for test-mock:12345, 201 Created in 18.3ms @ maven/maven.go:229(maven.UploadPackageFile) testlogger.go:60: 2024/04/02 13:25:12 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/6.jar for test-mock:12345, 201 Created in 18.3ms @ maven/maven.go:229(maven.UploadPackageFile) testlogger.go:60: 2024/04/02 13:25:12 ...eb/routing/logger.go:102:func1() [I] router: completed PUT /api/packages/user2/maven/com/gitea/test-project/1.0.1/9.jar for test-mock:12345, 201 Created in 18.9ms @ maven/maven.go:229(maven.UploadPackageFile) FAIL make: *** [test-pgsql#TestPackageMavenConcurrent] Error 1

Hope this helps!

lunny commented 5 months ago

Created a PR #30335 and put your tests there.

FFock commented 5 months ago

We observed the same or similar bug when migrating several TB of Maven artifacts to a Gitea Maven repository. We now have many packages that cannot be deleted, because the corresponding (S3?) BLOB cannot be found. So our question is: Even if this bug is hopefully fixed very soon, how can damaged packages be removed best from an existing repository?

proxity commented 5 months ago

I've created the new issue https://github.com/go-gitea/gitea/issues/30446 for what @FFock has mentioned above.

tlusser commented 5 months ago

Hello! I created a Hotfix example which only allows to upload one package per type like maven at a time. This fix needs to be adapted, when using an other version than 1.21.10. I hope this helps someone:

https://github.com/tlusser/gitea/commit/02999c36180993de822250280413c4a96fe1c583

BR, Thorsten

PS: I also had to fix the test since it could have uploaded the same package twice, but it expected it to upload only once. (go routine variable capture issue)

tlusser commented 4 months ago

@lunny : Do you think you code works with the updated test?

lunny commented 4 months ago

My code doesn't work for now because I think the table package_property needs some changes.

proxity commented 4 months ago

@tlusser your proposed hotfix helped, but I found out another problem that was not yet solved when pushing a high amount of submodules with a single mvn deploy.

Maven does uploads in parallel. When a POM and a JAR are uploaded for the same new version at the same point in time, it can happen that the package_version contains the string 'null' in metadata_json. This makes the package_version unusable and returns error 500 for queries of the API or UI.

gitea=# select * from package_version where id=343;
 id  | package_id | creator_id |       version       |    lower_version    | created_unix | is_internal | metadata_json | download_count 
-----+------------+------------+---------------------+---------------------+--------------+-------------+---------------+----------------
 343 |        121 |         84 | 23.07.1234-SNAPSHOT | 23.07.1234-snapshot |   1713453387 | f           | null          |              0
(1 row)

gitea=# select * from package_file where version_id=343;
  id  | version_id | blob_id |                            name                             |                         lower_name                          | composite_key | is_lead | created_unix 
------+------------+---------+-------------------------------------------------------------+-------------------------------------------------------------+---------------+---------+--------------
 2518 |        343 |    1770 | pwc-commons-kernel-23.07.1234-20240418.151520-1.jar         | pwc-commons-kernel-23.07.1234-20240418.151520-1.jar         |               | f       |   1713453387
 2521 |        343 |    1773 | pwc-commons-kernel-23.07.1234-20240418.151520-1.pom         | pwc-commons-kernel-23.07.1234-20240418.151520-1.pom         |               | t       |   1713453387
 2522 |        343 |    1774 | pwc-commons-kernel-23.07.1234-20240418.151520-1-sources.jar | pwc-commons-kernel-23.07.1234-20240418.151520-1-sources.jar |               | f       |   1713453387
(3 rows)

As you can see, created_unix is the same.

I could identify the problem and extend your hotfix to cover it.

There's a race condition in UploadPackageFile. The assumption of the original author is that mavendata_json is set during a POM file upload for an existing version or if the POM file is the first upload to the new version at createPackageAndVersion. But if the POM and JAR are uploaded simultaneously, both can enter CreatePackageOrAddFileToExisting . At the end, the metadata from the pom is lost, because the JAR file is a nano second ahead. I could fix it by pulling up your mutex and place before https://github.com/go-gitea/gitea/blob/main/routers/api/packages/maven/maven.go#L266, so basically before any calls to GetVersionByNameAndVersion.

This fixed the issue for us.

tlusser commented 4 months ago

I think that just a global mutex on maven package upload function solves this issue. It would only allow one upload at a time (slower uploads), but ensure that packages are uploaded without race conditions. @proxity , @lunny : What do you think about?

lunny commented 4 months ago

I think that just a global mutex on maven package upload function solves this issue. It would only allow one upload at a time (slower uploads), but ensure that packages are uploaded without race conditions. @proxity , @lunny : What do you think about?

As a quick temporary fix, I think it's OK.

tlusser commented 4 months ago

For everybody who needs this quick fix please see https://github.com/tlusser/gitea/commit/fb925d159c5c25396311643341e610df0dc100fc

proxity commented 4 weeks ago

So we don't have a fix included in 1.22.1? My fix could be improved to base the mutex on the package version that is going to be created in the db. That would not slow down other maven uploads which are creating different package versions.

lunny commented 3 weeks ago

I think we can send a PR to have a lock based on StatusTable lock so that it will only lock the same package. But this should be a short-term resolution.

tlusser commented 3 weeks ago

Thank you for providing a fix in next bugfix release. Will you also include checks in doctor command to clean invalid data out of the database?

lunny commented 3 weeks ago

Thank you for providing a fix in next bugfix release. Will you also include checks in doctor command to clean invalid data out of the database?

As I know, only the table package_property may have dirty data because it has no unique index. For other package related tables, duplicated data should not be inserted because of unique index. Do you find any other invalid data?

tlusser commented 3 weeks ago

Hello, AFAIK there was also tables with missing references to other tables or missing blobs - but it is hard to remember, because i fixed it once in out postgres database and build my own version with a lock which was described in above comment. Afterwards everything seemed to be ok.

Are you sure, that there can not be any issues?

lunny commented 1 week ago

Fixed by #31954