go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
44.19k stars 5.42k forks source link

Gitea is slow when external avatar services are down #6046

Closed gaah666 closed 4 years ago

gaah666 commented 5 years ago

Console log: [Macaron] 2019-02-12 14:47:11: Started GET / for 10.10.10.1 [Macaron] 2019-02-12 14:48:11: Completed GET / 200 OK in 1m0.064527745s

app.log -file: 404 2019/02/12 14:47:11 [D] Session ID: 541269a9dc47baf3 405 2019/02/12 14:47:11 [D] CSRF Token: -IiHK1CVOgkwyuyXL13P1Gri6Xc6MTU0OTk3NTQ5NzM3NzY5MDYxNg== 406 2019/02/12 14:47:11 [D] Template: user/dashboard/dashboard 407 2019/02/12 14:47:31 [...modules/base/tool.go:216 SizedAvatarLink()] [E] LibravatarService.FromEmail(email=x@localhost): error 407 lookup _avatars-sec._tcp.localhost on 4.2.2.2:53: read udp 10.0.2.15:45383->4.2.2.2:53: i/o timeout 408 2019/02/12 14:47:51 [...modules/base/tool.go:216 SizedAvatarLink()] [E] LibravatarService.FromEmail(email=x@localhost): error 408 lookup _avatars-sec._tcp.localhost on 4.2.2.2:53: read udp 10.0.2.15:58838->4.2.2.2:53: i/o timeout 409 2019/02/12 14:48:11 [...modules/base/tool.go:216 SizedAvatarLink()] [E] LibravatarService.FromEmail(email=x@localhost): error 409 lookup _avatars-sec._tcp.localhost on 4.2.2.2:53: read udp 10.0.2.15:42802->4.2.2.2:53: i/o timeout

Description

When enabling external avatars, when those sites are down or doesn't reply in a timely fashion, I think Gitea isn't behaving in a nice way... As seen from the logs above... Browser makes a request, then exactly 1m after it actually starts loading the page and resources aso... 1 minute per page load, because an external avatar request timed out.

I think this should be a lot shorter cut off time than one minute, or be configurable.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

lunny commented 5 years ago

You can change the config to disable that external avatar service.

gaah666 commented 5 years ago

@lunny Yes I know that. That is not the issue.

The issue is that when you want to use them, but the service is down the gitea pages stop loading, waiting for services that are down for a full minute. Better behavior is that the pages should load anyway (since avatars, although nice to have isn't crucial for gitea usage).

zeripath commented 5 years ago

Agreed. Why don't we provide a per user avatar link which on request redirects to the looked up avatar instead of providing the avatar link directly.

mrsdizzie commented 5 years ago

@zeripath I think that would have the same problem, that each page would request those links trying to display an avatar, which would then need the real link.

I think there are probably multiple problems related to avatars from what I've heard, but after going down a hole I think this is something particular (sorry for long comments):

The error logs above are specific to using federated avatars and looking up the SRV record for the domain name in the users email address, which comes from the go-libavatar library we use here:

strk.kbt.io/projects/go/libravatar

https://gitlab.com/strk/go-libravatar/blob/master/libravatar.go#L191

This exmple seems to have ENABLE_FEDERATED_AVATAR = true and the user email is set to x@localhost

So the library above is doing an SRV query to localhost based on that email address.

The go dns resolver will look in /etc/resolv.conf on a unix machine by default usually has something likeoptions timeout:2 attempts:5

So timeout after 2 seconds and try 5 times. That is per nameserver. So if somebody were using two name servers (say 8.8.8.8, 8.8.4.4) it would do that SRV query 5 times for each nameserver, timing out after 2 seconds. That would be 20 seconds total for each avatar -- as seen in the logs above. If the person above had say, 3 avatars to look up it would time out after exactly 1 minute (20 * 3).

Obviously this problem can be exponentially worse if you are trying to look up 5, 10, 20 of these. Or hard to diagnose if you use federated avatars and of 10 of them, one is a bad email address that adds an extra 20 seconds load.

So there are a couple of things to think about:

1: Don't try to load more than one of the same avatar if the first one has failed. Perhaps this can happen by looking up what needs to be loaded and doing it all at once, failing if the first one times out.

2: Don't rely on even one DNS lookup that can take 20 seconds to fail. This would probably mean changes to the referenced library at strk.kbt.io/projects/go/libravatar

3: Just store a copy of federated/gravatar avatar locally and refresh that copy every x amount of time as a background task. Do the original lookup when an account is set up so it doesn't block normal page loads and then refresh at an interval.

4: Maybe as a temporary fix for some examples like this one, don't lookup an avatar for localhost or other probably not real domain names

I think a good future solution is to do lookup of the avatar and save that image locally, refreshing either on a time basis, or perhaps if somebody visits their profile page as well. Even if the timeouts are made better, I don't think each page load should rely on links and/or connections to outside sites that may or may not be working at the moment. This would fix the "gitea is slow because of another server" problem and also avoid the potential "sometimes my pages loose all avatars" problem too. I believe this is something that popular services that make use of Gravatar do (Github, bigger WordPress sites, etc...)

Alternatively, if the above change isn't likely to happen perhaps dropping support for federated avatars (not gravatar or similar services). It can't ever be reasonable to run a dozen SRV queries when trying to load a page, even with a better timeouts.

techknowlogick commented 5 years ago

As @strk is a member of Gitea maintainer team, sending patches upstream would likely be possible.

We could use github.com/miekg/dns which can leverage DNS timeouts, and also possible cache timeouts (maybe for 1 hour, or something smaller than caching the positive responses).

lunny commented 5 years ago

As @mrsdizzie 's advice, we could cache all external avatars and refresh them every day.

mrsdizzie commented 5 years ago

I think we could/should do both of those: Make the fetching of the avatars better but also cache them on disk for displaying; Ideally Gitea shouldn't rely on connections to any third party servers for content to display on page views if possible.

gaah666 commented 5 years ago

That was really what I wanted to say, as pointed out above. I wrote about avatars because that's how I noticed it, but my problem is really that gitea should function as it should, even if all/any external links/sites are down.

And, IMHO, external services add value (I love avatars) so it would really be unfortunate if it was removed... Just treat it as a best effort, if it works it works but don't let core functionality depend on them.

strk commented 5 years ago

I'm very happy to accept a patch for go-libravatar to tweak timeouts/TTLs.

Lookup could be maybe done by some background process to reduce the pain.

As per local cache, that one could be implemented in Gitea itself, no need to be in the library (for example we might want Gitea to include in that cache avatars that were obtained in a different way)

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

lunny commented 5 years ago

@strk any progress? I think maybe you can move go-libravatar from gitlab to gitea instance if you want some help on that.

strk commented 5 years ago

There's already a mirror on gitlab: https://gitlab.com/strk/go-libravatar/

strk commented 5 years ago

And yes, help is welcome (although I still think cache should be done in Gitea itself)

strk commented 5 years ago

I've a plan: gitea always passes to the browser a url to self (/user/x/avatar/size) and the server takes care of the redirect. How does that sound ?

strk commented 5 years ago

Please give #8245 a try

strk commented 4 years ago

Can this be closed now that #8245 was merged ? Or do we want to wait for the per-commit avatar link first ? (non-registered users don't have a redirect endpoint at the moment, to avoid showing the contributors email to clients, although I'm not sure what's the point of that, given email is visible in git history anyway)