Expose more informations on /metrics

lunny commented 3 years ago

I think we should expose more runtime informations on the endpoint /metrics

[ ] Running Git processes number
[ ] CPU usage
[ ] Memory usage
[ ] Running Cron tasks number
[ ] Running Migration tasks number
[ ] Running Queue worker number

noce97 commented 3 years ago

Thanks for openning the issue @lunny.

As discussed here, it would be nice to also monitor the http responses Gitea API endpoints provide, in order to monitor possible failures/ bad requests when calling a specific endpoint.

As a matter of interest, what type of priority is going to be given to this issue?

Thanks in advance, Gitea is great 😄!!

techknowlogick commented 3 years ago

As a matter of interest, what type of priority is going to be given to this issue?

@noce97 as we are all volunteers, we can't commit to any level of priority.

noce97 commented 3 years ago

Great, no worries. We'll wait for it then.

Thanks for the quick response @techknowlogick !

rremer commented 2 weeks ago

I was looking at how I would instrument a few places for extra visibility, and I see a general problem for the extensibility of the existing modules/metrics/collector.go, in that it is essentially implementing a scrape/pull model inside Gitea. I've run into problems with performance of solutions like this in other languages, and see that this same problem likely happened in Gitea as well given that there are feature flags for enabling certain metrics; I suspect those feature flags are to protect performance of the /metrics endpoint, as metrics collection for some of those are not in-memory and need to be collected in real time.

There's an alternative approach which I've had success with which is to have the individual services/classes register their metrics as they generate the data for them. This has the benefit of no 'surprise' overhead in collecting metrics, and metrics are not updated every time /metrics is scraped but instead exactly once each time data for each metric is generated. To explain this in more detail, let me use a concrete example:

I would like to instrument modules/git/command.go, and have Gauges and/or Histogram for execution durations as well as labels for the directory, exit status code, and command run. This way I could track down repositories that are unevenly consuming cpu resources, and help target optimizations for future pull requests. To do this in the existing pattern, I would register the Metric in collector.go, and then call a function in command.go to emit the metrics to the channel for collecting. In order to have that command.go function emit metrics, I'd have to store prior command executions in that class. ... alternatively, I could just register the metric directly from command.go when Run() was called.

This is a deviation in collection method and code organization, would the maintainers be open to this style of distributed/realtime metrics registration?

Note also that these two issues can probably be merged into this one: https://github.com/go-gitea/gitea/issues/28906 https://github.com/go-gitea/gitea/issues/17653

lunny commented 2 weeks ago

Merged from #28906

Feature Description

I think it would be great if gitea had a metrics for the number of unread system notices (web UI path /admin/notices, model https://github.com/go-gitea/gitea/blob/main/models/system/notice.go)

# HELP gitea_system_notices Number of system notices
# TYPE gitea_system_notices gauge
gitea_system_notices 524

It would be even greater if it possible to distinguish between success and error notices. (See NOTICE_ON_SUCCESS settings in https://docs.gitea.com/administration/config-cheat-sheet) e.g.:

# HELP gitea_system_notices Number of system notices
# TYPE gitea_system_notices gauge
gitea_system_notices{type="success"} 123
gitea_system_notices{type="error"}  0

I really like to contribute and can create a PR if you consider this useful.

lunny commented 2 weeks ago

Merged from #17653

Feature Description

It's great that people are aware of Prometheus, but what is the usefulness of metrics added in #678 ? Realistically these can't be used for alerting.

How about we add some actually useful metrics:

Successful/failed SSH login attempts
Successful/failed login attempts via web
Currently open TCP connections (recently our Gitea gets hammered by DDoS attacks)
HTTP response latency histogram
Errors encountered while interacting with Git repos
Failed/successful webhook calls

These were just few to mention. Some of these can be handled by haproxy that sits in front of Gitea which is the case for us of course...

go-gitea / gitea

Expose more informations on /metrics #14724

Feature Description

Feature Description