dhruv-4 / temp

0 stars 0 forks source link

Context

This is about this epic: ARCH-260 Next Generation Logging and Tracing . The overall aim to improve logging and debugging. This is how we came up with the steps

  1. Standardise logs
  2. Introduce a tool to visualize logs
  3. Distributed tracing

Will try to go through each steps a bit

Standardise logs

The first step we determined will be to standard logs

Solution

1. Updating logging object This could be what I think would be helpful object Response Error Object ```ts { application: string, statusCode: number, message: string, level: 'info' | 'error' | 'warn' , environment: 'prod', request: { url: string, headers: {} }, response: { statusCode: number, message: string, headers: {} }, responseTime: string } ``` Info log ```ts { application: string, message: string, level: 'info', environment: 'prod' } ``` Error log ```ts { application: string, message: string, level: 'error', environment: 'prod' } ``` Heres an example of what the change could look like, (before vs after) unauthorized_error We checked out replacing `bunyan` with `pinojs` for this.You can check out this https://github.com/comtravo/ct-backend/pull/11487 Using pinojs since its faster and has an ecosymtem of logging around it.So we woyuld be able to keep the logging format similar
2. Adding environment to logs Currently we index logs by environment, this means we have different sources for different environments.This also means it makes it difficult to change sources.If we add environment to logs and index together, we could easily switch environments in logs from the logs itself ![image](https://user-images.githubusercontent.com/75316673/127350174-3a631dd1-2a8b-4eb4-8145-3ff96adf70b4.png)
3. Source for logs For alignment and ease, we could still keep using elasticserahc as our source for different tool
4. Getting rid of debug lib Currently we use a package called `debug`, which allows us to add a env variable `DEBUG: *` and once this is set these log are used for debugging lambdas and services. We could instead make use [pino-debug](https://github.com/pinojs/pino-debug). This would allow us make use of the same library and make use of the same library like `logger.debug()`
5. Getting rid of ct_inspector Currently we use `ct_inspector` to log and also track the amount of time for third party. This is also used to in this [flight search dashboard](https://grafana.prod.comtravo.com/d/4xPM7fGr8/flight-search-api-details?orgId=1&refresh=5m) ![image](https://user-images.githubusercontent.com/75316673/127357499-f42e2e51-f196-4834-8627-223428d40635.png) A idea from Puneeth was to use timeseries db for this.
6. Redacting customer info before using a 3rd party tool Currently when swagger validation fails it logs the entire object that failed. This also includes stuff like `booking.guest_travelers` and `booking.booker` which has all info like email and phone number. So we should try to redact these before actually using a 3rd party tool for visualization. It becomes really easy with pinojs ```ts redact: { paths: [ 'req.headers.authorization', 'request_data.body.booker', 'request_data.body.guest_travelers' ], remove: true }, ```

Introduce a tool to visualise logs

Tools in consideration now

  1. Sentry
  2. Jaeger Tracing

Things to consider when choosing this