DIG-1719: First pass at standardizing logging

daisieh commented 1 month ago

I've created a standardized library for python logging at https://github.com/CanDIG/candigv2-logging that unifies the formatting and configuration for most of our python-based modules. By switching the python-based modules to the same format, we can use fluentd's configuration to process the messages to a more standardized JSON format.

For now, I've written separate filters to process flask and Django logs, tyk logs, and opa logs, but we can write more as needed. Alternatively, we may want to write a custom plugin (https://docs.fluentd.org/plugin-development/api-plugin-parser) to handle the parsing of our custom log format more thoroughly.

To test: check out this branch and make sure that the submodules are up to date with the commits in this branch. Then if you do a clean-all/build-all, you should start seeing logs accumulating in a buffer log file in tmp/logging, and they should roll over every day to a new file.

daisieh commented 1 month ago

Please feel free to continue pushing updates to this PR. There are definitely log messages that I haven't written parsers for yet.

SonQBChau commented 1 month ago

I took a quick look, and it seems we’ll need to modify Katsu so it can run without the stack. Currently, the logging module is required even for local use. I can write the code when we make the PR for Katsu, but I wanted to bring this up for clarity.

kcranston commented 1 month ago

My approval is for the general approach. Assume technical issues identified by @OrdiNeu will be fixed before merge.

daisieh commented 1 month ago

2024-07-29 17:42:59 +0000 [info]: using configuration file:
@type forward
@id input1
@label @mainstream
port 24224
bind "0.0.0.0"
<filter > @type stdout <label @mainstream> <match docker.> @type file @id output_docker1 path "/fluentd/log/docker.*.log" symlink_path "/fluentd/log/docker.log" append true time_slice_format %Y%m%d time_slice_wait 1m time_format %Y-%m-%dT%H:%M:%S.%N%z
timekey_wait 1m timekey 86400 path /fluentd/log/docker.*.log
  <inject>
    time_format %Y-%m-%dT%H:%M:%S.%N%z
  </inject>
</match>
<match **>
  @type elasticsearch
  @id output_fluentd
  host "elasticsearch"
  port 9200
  logstash_format true
  logstash_prefix "fluentd"
  logstash_dateformat "%Y.%m.%d"
  time_key_format "%Y-%m-%dT%H:%M:%S.%N%z"
  utc_index false
  include_tag_key true
  tag_key "@log_name"
  flush_interval 1s
  template_overwrite true
  <buffer>
    flush_interval 1s
  </buffer>
</match>

This looks like the old version of fluentd.conf...can you make sure that you've deleted any relevant volumes and images to see if you can trigger it to pick up the new version.

CanDIG / CanDIGv2

DIG-1719: First pass at standardizing logging #680