go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
45.45k stars 5.52k forks source link

Migration of big GitHub repo fails #13241

Open davidak opened 4 years ago

davidak commented 4 years ago

Description

When i try to migrate nixpkgs to the Codeberg test server, it always fails. We have increased the migration timeouts x100.

When we wait or open https://codeberg-test.org/NixOS/nixpkgs/settings/branches, we get a trace with error in setting_protected_branch.go:34.

It is not a hardware limitation.

Screenshots

Screenshot from 2020-10-21 15-14-34

Screenshot from 2020-10-21 15-23-41

Screenshot from 2020-10-21 19-03-50 Screenshot from 2020-10-21 19-08-58

jolheiser commented 4 years ago

For anyone looking in to it, this is the line referenced.

https://github.com/go-gitea/gitea/blob/ba97c0e98bc97957d6fd9bfd3db5768813c58ff3/routers/repo/setting_protected_branch.go#L34

davidak commented 4 years ago

@jolheiser are you a bot? :smile: (because of the instant reaction)

Thanks for the information!

It might be possible that not all branches are migrated or the data is broken, so ctx.Data["Branches"] don't exist?

Codeberg-org commented 4 years ago

To add some more color, this log line (also referenced in the gist) indicates that gh throttled API calls, fetched data may or may not be truncated or incomplete:

modules/task/task.go:51:handle() [E] Run task failed: GET https://api.github.com/repos/NixOS/nixpkgs: 403 API rate limit of 5000 still exceeded until 2020-10-21 16:33:28 +0200 CEST, not making remote request. [rate reset in 34m49s]

(fwiw there was no log line reporting API request errors or timeouts).

jolheiser commented 4 years ago

@jolheiser are you a bot?

I really hope not! 🤖

Also, for what it's worth I believe this would benefit from https://github.com/go-gitea/gitea/pull/12244 as well.
Dumping to disk may not help if we are hitting a rate limit anyways.

davidak commented 4 years ago

And nixpkgs is way bigger. It has 100,000 issues and pull requests. Probably the second most on GitHub (1. is vscode).

Codeberg-org commented 4 years ago

Even if API limits are hit, migration should not fail but get throttled, too? Ideally with some estimate of ETA?

jolheiser commented 4 years ago

It seems we do have some code for sleeping if we hit a rate limit.

https://github.com/go-gitea/gitea/blob/f0fe5683feb2799e3ca467bc75fd77871b65452b/modules/migrations/github.go#L113-L128

Do you have any logs surrounding that rate limit error that may indicate where in the migration process it was?

davidak commented 4 years ago

Dumping to disk may not help if we are hitting a rate limit anyways.

We got this trace 3 times without seeing the api limit message, so it is not related to this issue. We got the limit after trying too often and waited.

davidak commented 4 years ago

Even if API limits are hit, migration should not fail but get throttled, too? Ideally with some estimate of ETA?

That's a separate issue (https://github.com/go-gitea/gitea/issues/13243).

6543 commented 4 years ago

@davidak I dont know how expencive it is, but if one of NixOS maintainers buy a GitHub enterprice account for the periode of migration, it would be faster in genera ... (higher api rate limit)

davidak commented 4 years ago

@6543 do you mean GitHub Pro (https://github.com/account/upgrade) or where can you buy it?

Screenshot from 2020-10-21 19-43-40

That seem like a useful information that should be part of documentation.

But you should not have to buy your data free in general.

jolheiser commented 4 years ago

@davidak He's referring to https://github.com/enterprise which is basically self-hosted GitHub.

6543 commented 4 years ago

@jolheiser no i thought I saw something for github itseve to icrease api request limit ... ... was searching in the docs but could not find it anymore

a other trick we could to: allow multible tokens for github migrator ... to switch token when one got ratelimited? @davidak what do you think?

6543 commented 4 years ago

@davidak also as suggested on https://stackoverflow.com/questions/16732103/increase-github-api-limit just contact github directly?

davidak commented 4 years ago

Combine your tokens to free the code :fist_raised:

Might not be a good idea to give someone else your token...

Asking github would be an option, but it would be best if it just don't fail and respect the API limit.

But this issue here has to be fixed first before we can try again.

Codeberg-org commented 4 years ago

Do you have any logs surrounding that rate limit error that may indicate where in the migration process it was?

Could not see anything in the log; is some special app.ini config needed to enable this logging?

6543 commented 4 years ago

could be related to #13230

lunny commented 4 years ago

If you are migrating multiple projects at the same time, there are some chances to encounter the rate limit. The rate limit is per github acccount I think.

TheFrenchGhosty commented 4 years ago

I can confirm this issue. This is a MASSIVE problem.

I encounter this issue even if I mirror 1 project at a time.

I even encounter this issue with a brand new Github account.

The instance tested are all running 1.12.5. (edit: also an issue on 1.13.0+rc1)

From what I discovered Codeberg doesn't seem to be affected by this, but every instance where I tried (including my own) is affected.

@Codeberg-org do you guys have any specific settings for migrations? I couldn't find anything special in this config file https://codeberg.org/Codeberg/build-deploy-gitea/src/branch/master/etc/gitea/conf/app.ini

billewanick commented 1 year ago

I'm also running into this issue. I'm trying to create a pull mirror of Nixpkgs on my own Gitea instance. The server url is git.ewanick.com. The server configuration is defined here, and the gitea instance is defined in this module.

The details from the Gitea Admin console are:

Gitea version (or commit ref): 1.19.3 built with go1.20.4 : pam, sqlite, sqlite_unlock_notify
Git version: 2.40.1, Wire Protocol Version 2 Enabled
Operating system: NixOS 23.05 "Stoat"
Database: SQLite

The server is running on a 2 Core / 4GB Ram Linode dedicated server. I am able to manually clone the Nixpkgs repo to the gitea repositories folder and recover it as an abandoned repo, but then it won't let me set up the pull mirror, only push.

I have updated the timeouts to be extra long and it still fails:

Migration Timeout - 6000 seconds
Mirror Update Timeout - 6000 seconds
Clone Operation Timeout - 300 seconds
Pull Operation Timeout - 6000 seconds
GC Operation Timeout - 6000 seconds

I am using a GitHub Personal Access Token to try and get around rate-limiting.

Any help with this would be greatly appreciated, including helping me figure out where the logs are.

6543 commented 1 year ago

if it's only about the git data - dont use github but just the git option without your token and clone via https

and you can just append more 00 to all your timeouts temporary - that should do it

billewanick commented 1 year ago

Changed all my timeouts to 60000, and tried the git option with https. Still failed when I tried it several times. In the Firefox console there was one error, a 404 on a GET request. task12does not exist