Open tlvu opened 4 years ago
Managed to get Fluentd to parse NGinx logs.
FROM fluent/fluentd:edge
USER root
RUN apk add --no-cache --update --virtual .build-deps \ sudo build-base ruby-dev \ && sudo gem install fluent-plugin-prometheus \ && sudo gem sources --clear-all \ && apk del .build-deps \ && rm -rf /tmp/ /var/tmp/ /usr/lib/ruby/gems//cache/.gem
COPY entrypoint.sh /bin/
USER fluent
2. Configure `fluent.conf`:
and launch the container. It will monitor the logs, and parse any new log added. The `pos_file` allows to store the position of the last read.
For the record, the installation of the service was straightforward, but its configuration was painful. The documentation is confusing (the same names mean different things in different contexts) and the regexps use the Ruby syntax, which is slightly different from the Python syntax. The error messages are clear enough however to fix issues as they arise.
@fmigneault I have a vague memory that canarie-api parse the nginx/access_file.log? Any problem if the format of that file change to json? It's easier to parse json format when we need to ingest the logs to extract metrics.
@tlvu Yes. It parses the logs using this regex: https://github.com/Ouranosinc/CanarieAPI/blob/master/canarieapi/logparser.py#L53 We would need some option and a small code edit to handle the JSON format instead.
@fmigneault finally I think Nginx allows to write the same log to several files so I'll keep the existing log file intact and write the json format to a different log file and then we can parse that other file. This is a more flexible and backward compatible solution. This will probably result in a new optional-component so the solution can be re-used by other organizations deploying PAVICS.
This looks promising: https://github.com/martin-helmich/prometheus-nginxlog-exporter
Migrated from old PAVICS https://github.com/Ouranosinc/PAVICS/issues/140
Automated deployment was triggered but not performed on
boreas
because ofWe need some system to monitor the logs and send notification if there are any errors. This log file error monitoring and notification can be generalized to watch any systems later so each system is not forced to reimplement monitoring and notification.
This problem has triggered this issue https://github.com/Ouranosinc/PAVICS/issues/176
There are basically 4 types of monitoring that I think we need:
Monitor system-wide resource usage (CPU, ram, disk, I/O, processes, ...): we already have this one
Monitor per container resource usage (CPU, ram, disk, I/O, processes, ...): we already have this one
Monitor application logs for errors and unauthorized access: we do not have this one. Useful for proactively catching errors instead of waiting for users to log bugs.
Monitor end-to-end workflow of all deployed applications to ensure they are working properly together (no config errors): we partially have this one with tutorial notebooks being tested by Jenkins daily. Unfortunately not all apps have associated notebooks or the notebooks exist but have problem being run non-interractively under Jenkins.