edx / edx-arch-experiments

A plugin to include applications under development by the architecture team at edx
GNU Affero General Public License v3.0
0 stars 3 forks source link

Fix duplicate logs for edxapp #730

Open robrap opened 4 months ago

robrap commented 4 months ago

The edxapp logs seem to be duplicated between edx.log and the supervisor logs in Datadog. This causes confusion and increases costs.

AC:

See example log message.

Also see https://2u-internal.atlassian.net/browse/GSRE-1543?focusedCommentId=4921284 for some related problems.

timmc-edx commented 4 months ago

Noting some examples of mistagged log files and the configs that captured them:

timmc-edx commented 4 months ago

Could this just be another effect of https://github.com/edx/edx-arch-experiments/issues/724?

robrap commented 4 months ago

I'm not sure if the mistagging is related, but the duplicate log issue is old. Can we concentrate on resolving that issue first? The tagging issue may need to be its own ticket.

timmc-edx commented 4 months ago

I've been having difficulty figuring out if there is duplicate logging. But I think I have an example of that now:

/edx/var/log/lms/edx.log tagged env:prod service:edx-edxapp-cms-workers service:edx-edxapp-lms:

Jul 17 19:13:12 ip-10-2-71-115 [service_variant=lms][celery.app.trace][env:prod-edx-edxapp] INFO [ip-10-2-71-115  2425] [user None] [ip None] [trace.py:128] - Task lms.djangoapps.gating.tasks.task_evaluate_subsection_completion_milestones[...UUID...] succeeded in 0.02717464299985295s: None

/edx/var/log/supervisor/lms_default_9-stderr.log tagged env:prod service:edx-edxapp-cms-workers service:edx-edxapp-lms-workers:

2024-07-17 19:13:12,544 INFO 2425 [celery.app.trace] [user None] [ip None] trace.py:128 - Task lms.djangoapps.gating.tasks.task_evaluate_subsection_completion_milestones[...UUID...] succeeded in 0.02717464299985295s: None

So we have duplication between the service and supervisor logs, here.

timmc-edx commented 4 months ago

A search for "Task lms.djangoapps.gating.tasks.task_evaluate_subsection_completion_milestones" "succeeded" faceted by dirname and filename shows about equal numbers of results for /edx/var/log/supervisor/lms_default_<N>-stderr.log and /edx/var/log/lms/edx.log. However, looking at (dirname:/edx/var/log/supervisor filename:lms_default_*-stderr.log) OR (dirname:/edx/var/log/lms filename:edx.log*) celery.app.trace for various periods shows varying ratios of results, with very roughly 1.5x as many in the supervisor logs. I'm really not clear on what's getting logged in there and why there's a discrepancy in the counts for this subset of logs.

timmc-edx commented 4 months ago

From discussion:

Ticket title and description updated to match findings.