go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
45.2k stars 5.5k forks source link

Tag syncing fails silently due to buffer overflow #31934

Open matera-bs opened 2 months ago

matera-bs commented 2 months ago

Description

During the mirroring of a some legacy mirrors hosted in bitbucket. I came across an issue in the process that syncs the git repository tags to the database.

When the contents (i.e. the commit message) of a tag is very large (this particular repository has some tag whose commit message is larger than 100Kb), the internal buffer of the Scanner used to parser the 'git for-each-ref' output overflows. Sadly, there is no message on the log that give a clue as to what is happening. I'm not particular experienced on golang, but it seem that the struct returned by function NewParser (parse.go:30) always returns a nil error no matter what happens.

Gitea Version

1.22.1

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

2.46.0

Operating System

Ubuntu

How are you running Gitea?

I ran into the issue in production (docker inside k8s). Nonetheless I was able to reproduce the issue inside visual studio code.

Database

PostgreSQL

lunny commented 2 months ago

I think maybe because the release table has the Note column only 16K varchars.

bsofiato commented 2 months ago

I think maybe because the release table has the Note column only 16K varchars.

Not quite @lunny :(

At least on PostgreSQL, the type of the Note is text (see screenshots attached, the first one shows the xorm mapping of the release entity whereas the second one shows the generated database schema)

image

image

I was able to process the offending tags by adding the following code to the parser.go file. However, I it feels like it only sweeps the real problem under the rug (if there is a tag whose message's lenght is greater than 1Mb it will fail regardless). Moreover, it would increase the memory footprint when syncing the tags :(

image

P.S. According to the docs, we could create a default smaller buffer and allow it to grow until a certain size. If you guys think it is worthwhile I can create a PR to allow the buffer to grow to a larger size.

P.S. @matera-bs is my work account, this is why I answered this particular issue