algorand / go-algorand

Algorand's official implementation in Go.
https://developer.algorand.org/
Other
1.33k stars 459 forks source link

Replace Logrus #5505

Open winder opened 1 year ago

winder commented 1 year ago

Problem and Background

"we found our logrus logger is responsible for large REST API slowdown every time new blocks arrives or accounts are written"

Logrus is well known to be slow compared to other modern logging libraries, and is always included in benchmarks which show it to be trailing in performance:

It is not disputable that we would see some performance gain by switching logging libraries, but it is not clear how much of that gain translates to networking and consensus improvements.

Solution

Pick a new logging library. It should be fast and provide the necessary abstractions we need for things like sending errors to telemetry and whatever else our logging library does.

Unknowns

How disruptive would it be if the logging format changes?

Are there other fancy things like sending errors to telemetry that the logging library needs to accomplish?

Tasks

  1. Write a test that measure performance with and without any logging, to understand the impact more clearly.
  2. Select a new logging library (zerolog).
  3. Configure logging output to match the logrus output as closely as possible.
  4. Implement the logging interface using the new library, updating interface as necessary.
  5. Ensure errors are forwarded to telemetry
cce commented 1 year ago

@excalq has a POC branch that replaces logrus with a zerolog implementation behind our logging.Logger interface (plus a new telemetry reporter), but none of the log calls themselves were changed: https://github.com/algorand/go-algorand/compare/master...excalq:go-algorand:excalq/logging-refactor-zerolog

cce commented 1 year ago

For logging inside of performance-critical code (e.g. inside per-transaction or per-opcode iteration) you could build on an approach like this to modify them to access zerolog's typed logging methods and get low/zero-allocation overhead. But there are a lot of places we use logging that are so infrequent, it wouldn't provide any performance benefit to update them from fmt-style to typed calls.

excalq commented 1 year ago

More historical context exists in the internal epic #2646. As we're replacing Elasticsearch, we'll need to replace the elogrus client, so these efforts go hand-in-hand. Zerolog is what I ran with in Q4, and was a larger refactor due to its events being immutable. Logrus has mutable events (Entry objects) which elogrus modifies as Hooks. Testing may also need some changes, as our perf-test pipelines use telemetry (from ES) to calculate final numbers.