Add actionable logging to patching runs

melange396 commented 2 months ago

Patch runs are very similar to regular indicator runs, but have different reasons/purposes and theyre not run on a schedule. We should include information in our logs to signify when these runs are happening. This additional info can then be incorporated into monitoring and alerting systems to distinguish normal and patching activity, which will let us see that aberrations are due to patching.

The format of the logging additions is yet to be determined (New, additional log messages? New parameters on existing log messages? Both? Something else???), but it should be done in a way that is easily integrable into elastic and such.

melange396 commented 2 months ago

The "acquisition" step of patching runs can potentially be detected with the log message: logger.info(event='processing csv files from issue'... found at https://github.com/cmu-delphi/delphi-epidata/blob/8746ff2ef7a936bb93628bc1358471d7c6c4f5f8/src/acquisition/covidcast/csv_importer.py#L128

This works because patching runs need to put CSV files in a specific directory structure to specify the "issue" date for import (instead of the default "today"), and that is where that log message is emitted. This will likely only work until the following ticket is addressed, after which all indicators will supply acquisition with specific "issue" dates:

https://github.com/cmu-delphi/covidcast-indicators/issues/1907

minhkhul commented 2 months ago

Here's the plan to add just patch acquisition log to elastic: Currently, our normal indicator acquisition jobs log out here: /var/log/epidata/csv_upload_{acq_ind_name}.log Then that log content gets picked up by filebeat as configured here to be available on elastic stuff through this ingest pipeline. Right now, patch acquisition is logged out here: /var/log/filebeat-pickup/epidata.acquisition.covidcast.csv_to_database_batch-issue-upload-$(date -u +"%Y-%m-%dT%H_%M_%SZ").log Therefore, all that has to be done to add patch acquisition log to elastic is change patch acquisition to log out at /var/log/epidata/csv_upload_patch.log in the Acquisition cronicle job, and rely on current processes to pick up the logs as usual.

To test this (and potentially other later stuff), I'm gonna set up patch acquisition log pickup to elastic on staging:

Uncomment this.
Adjust current dashboards that cares only for prod data to ignore staging. Then check how things goes on staging with some fake patch data and these jobs.

melange396 commented 2 months ago

if you want to be sure to keep things out of other dashboards for testing purposes, instead of just uncommenting the pipeline in the staging filebeat config, change its name to filebeat-epidata-pipeline-staging and create a matching ingest pipeline with a new target_field, like "epidata_data__test"

minhkhul commented 1 month ago

Switching patch logging to be output to /var/log/epidata/batch_issue_upload.log instead of to /var/log/filebeat-pickup/epidata.acquisition.covidcast.csv_to_database_batch-issue-upload-$(date -u +"%Y-%m-%dT%H_%M_%SZ").log. This is so patch acquisition logs could be processed under the same pipeline as normal acquisition logs on elastic, which makes it easier for patch acquisition info to be seen on dashboards. Tested the change on staging and the logs showed up as expected on elastic, so applying this to prod.

next steps: address #1907

cmu-delphi / covidcast-indicators

Add actionable logging to patching runs #2009