Quansight / ragna

RAG orchestration framework ⛵️
https://ragna.chat
BSD 3-Clause "New" or "Revised" License
177 stars 22 forks source link

Make Ragna API logging format customizable with sensible default #377

Open peachkeel opened 6 months ago

peachkeel commented 6 months ago

Feature description

Oftentimes, I'll start up Ragna like so to create a log:

ragna api 2>&1 | tee ragna.log

I can then grep the log to find errors or exceptions:

grep -i -P -B1 "ERROR:" ragna.log

The results might look something like this:

INFO:     127.0.0.1:56789 - "POST /chats/88d89303-3360-41b4-b405-3531ea884ff0/answer HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application

It would be better, though, if they looked something like this:

[Tue Mar 26 08:28:57 2024] INFO:     127.0.0.1:56789 - "POST /chats/88d89303-3360-41b4-b405-3531ea884ff0/answer HTTP/1.1" 500 Internal Server Error
[Tue Mar 26 08:28:58 2024] ERROR:    Exception in ASGI application

Value and/or benefit

Frequently, I'm looking for HTTP 429 Too Many Requests or similar errors having to do with rate limiting. Knowing when they occurred can be beneficial in rescheduling API calls, especially if a daily limit has been breached.

Yes, one could maybe look at the time the log file was last modified; however, that's a pretty fragile solution because any intentional or unintentional update to the log would result in the loss of that information.

Anything else?

cc @Tengal-Teemo

Tengal-Teemo commented 6 months ago

@pmeier @peachkeel It shall be done

Tengal-Teemo commented 6 months ago

This feature has been somewhat implemented in this branch on @peachkeel ragna fork. It is currently trying to mimic Common Logging Format. Example Log: ('http', '127.0.0.1', 31476) - INFO [28/Mar/2024:16:39:54 +1300] - Uvicorn running on http://127.0.0.1:31476 (Press CTRL+C to quit)

pmeier commented 6 months ago

The important lines are

log_config = uvicorn.config.LOGGING_CONFIG
log_config["formatters"]["access"]["fmt"] = "%(args)s - %(levelname)s [%(asctime)s] %(message)s"
log_config["formatters"]["default"]["fmt"] = "%(args)s - %(levelname)s [%(asctime)s] %(message)s"
log_config["formatters"]["access"]["datefmt"] = "%d/%b/%Y:%H:%M:%S %z"
log_config["formatters"]["default"]["datefmt"] = "%d/%b/%Y:%H:%M:%S %z"
uvicorn.run(
    api_app(
        config=config, ignore_unavailable_components=ignore_unavailable_components
    ),
    host=config.api.hostname,
    port=config.api.port,
    log_config=log_config
)

A couple of questions:

  1. Do we need the %(args)s in there? Isn't that information encoded in the startup message as well as the configuration file used? That seems like quite some noise to log with every message
  2. What is the difference between the "access" and "default" formatters? "access" makes sense to me. Is "default" for all other messages?
  3. I'm not familiar with the common logging format, but it seems the example message above is missing a few parts of the spec. Should we maybe just add the timestamp as originally requested rather than calling this "common logging format"?
  4. I see the benefit of adding the timestamp by default. However, should we make this configurable? Maybe even make the format configurable?
Tengal-Teemo commented 5 months ago
  1. You are correct, args should be removed. It looks terrible and doesn't really add much, the only reason it was there was because I couldn't find the ip anywhere else in the record when I was trying to recreate CLF.
  2. I am not sure what is the difference between the access and default level formatters, I added both because I was matching what several people did online to edit the uvicorn logs.
  3. It's technically only missing bytes, as I believe request and status are replaced by message.

For now I will remove args and just add the timestamp like planned, but I'll see if I can find a way to add the host that doesn't look terrible.

Tengal-Teemo commented 5 months ago

@pmeier if you check the most recent commit, I've implemented the CLF standard more or less properly. I now understand the difference between access and default, one is for the api calls and one is for anything else. image

Tengal-Teemo commented 5 months ago

The only difference is that we don't really have an ident parameter, so I've left it as "-", and I've added the logging level and replaced the bytes with the http status

Tengal-Teemo commented 5 months ago

@pmeier Are you satisfied with this style of logging? I'm happy to make whatever changes to it.

pmeier commented 5 months ago

I know you and @peachkeel are closely connected, but the original feature request was for adding timestamps. And I'm totally aboard with that.

Regarding the CLF, I'm not sure. I'm certainly not very experienced with observability yet, but so far I haven't seen any application use it. That is not to say that it isn't used.

Meaning, I would be more comfortable if we only went with just the timestamps for now and maybe tackle a user defined timestamp and logging format as proposed in 4. of https://github.com/Quansight/ragna/issues/377#issuecomment-2024627533 later. Thoughts?

peachkeel commented 5 months ago

Hi, @pmeier. I'm a dinosaur, and I'm the one who suggested (offline) that maybe @Tengal-Teemo go with CLF since it has been recognized as a de facto standard:

https://www.rfc-editor.org/rfc/rfc6872.html

Such standardization brings benefits, especially in terms of reusing existing analytics tooling.

Of course, we don't have the bytes field as found in CLF, but that could be left empty with a "-" demarking it.

Anyway, I would support full customization with CLF as the default.

pmeier commented 5 months ago

I'm a dinosaur

That could go at least two routes:

  1. "This is what we used back in my days. Not sure what is happening today"
  2. "Trust me, I've seen enough systems to confirm that this something that is generally used and understood".

From the phrasing of the rest of the comment, I assume you mean something along the lines of 2., but I want to make sure here.

Anyway, I would support full customization with CLF as the default.

Happy to. Let's pivot in that direction.

pmeier commented 5 months ago

One thing that I want to avoid, is to come up with a customization scheme that does not support common cases. One thing that I have seen a lot in the recent years is structured logging. I just did a quick search and found this blog that seems to suggest that it is not possible to do without actually injecting middleware into the application. @Tengal-Teemo could you make sure that this is the case?

If yes, we should be fine with just exposing the fmt and datefmt from https://github.com/Quansight/ragna/issues/377#issuecomment-2024627533 to the user. Otherwise, this might get more complicated.

Tengal-Teemo commented 5 months ago

@pmeier After looking into it, I believe that it depends what you mean by middleware. If you mean the BaseHTTPMiddleware they use, then no. Technically, you can just create a JsonFormatter as the blog does and attach that to the logging root before you call uvicorn_run(). The extra stuff they add can simply be obtained from the 'args' field of the record, and you could just check the record.name to see if you're looking at a record generated by uvicorn.access or not. However to do something like this yes you do need to override the default formatter used by uvicorn.

Tengal-Teemo commented 5 months ago

If we do want to expose the logs to the user, it might be better to expose a log_config.yaml file that allows you to specify the fmt and datefmt, like below which I amended from this thread. However, this isn't the most workable framework for things like structured logging, as you're pretty much required to work within the formatters used by uvicorn if you use this method.

version: 1
disable_existing_loggers: False
formatters:
  default:
    "()": uvicorn.logging.DefaultFormatter
    format: 'localhost:31477 - %(name)s [%(asctime)s] %(levelprefix)s %(message)s'
    datefmt: '%d/%b/%Y:%H:%M:%S %z'
  access:
    "()": uvicorn.logging.AccessFormatter
    format: '%(client_addr)s - %(name)s [%(asctime)s] %(levelprefix)s %(request_line)s %(status_code)s'
    datefmt: '%d/%b/%Y:%H:%M:%S %z'
handlers:
  default:
    formatter: default
    class: logging.StreamHandler
    stream: ext://sys.stderr
  access:
    formatter: access
    class: logging.StreamHandler
    stream: ext://sys.stdout
loggers:
  uvicorn.error:
    level: INFO
    handlers:
      - default
    propagate: no
  uvicorn.access:
    level: INFO
    handlers:
      - access
    propagate: no
pmeier commented 4 months ago

Ok, let's just roll with CLF for now and worry about this later. I prefer to make this configurable by the user, but having a more reasonable by default is higher prio now.