The purpose of restucturing is to ensure pipeline failures do not
escalate up the call chain:
A site failure should not impact any other site
A course failure should not cause site processing failure
With the caveat that the failed course's data would not be reflected
in the site aggregated data
An enrollment failure should not cause the course processing to fail
with the caveat that the failed enrollment's data would not be
reflected in the course aggregated data
See the module docstring in tests/tasks/test_daily_metrics.py for more
details
Logging has improvements. Namely, the logs have prefixes to help grep
through them for details. The information provided in the logs has
improved to identify the site, date for, and course id for the
processing and failures.
Logging to figures.models.PipelineError has been removed as it is not
providing benefit in its current form. Follow on work is providing post
pipeline reporting. This should provide better visibility into pipeline
data health
Restructured figures.tasks tests. Was 'tests/test_tasks.py'. Now there
are process specific modules in 'tests/tasks':
test_daily_metrics.py
test_monthly_metrics.py
test_mau_tasks.py - This may go away as it was never actually used in
production. MAU is captured monthly in figures.models.SiteMonthlyMetrics
and MAU 2G (second generation) is going to be finally implemented soon
we hope
Minor fix - pipeline course daily metrics now cases 'course_id' to string when
working with figures.models.CourseDailyMetrics. This is to prevent
errors if a CourseKey is used instead of the string representation of
the course id. (redacted rant on CourseKey)
Add comments to backfill.backfill_enrollment_data_for_site
Added comments as notes to improve updating enrollment data
0.4 Tasks Error Handling and Logging Improvements
The purpose of restucturing is to ensure pipeline failures do not escalate up the call chain:
See the module docstring in tests/tasks/test_daily_metrics.py for more details
Logging has improvements. Namely, the logs have prefixes to help grep through them for details. The information provided in the logs has improved to identify the site, date for, and course id for the processing and failures.
Logging to figures.models.PipelineError has been removed as it is not providing benefit in its current form. Follow on work is providing post pipeline reporting. This should provide better visibility into pipeline data health
Restructured figures.tasks tests. Was 'tests/test_tasks.py'. Now there are process specific modules in 'tests/tasks':
Minor fix - pipeline course daily metrics now cases 'course_id' to string when working with figures.models.CourseDailyMetrics. This is to prevent errors if a CourseKey is used instead of the string representation of the course id. (redacted rant on CourseKey)
Add comments to backfill.backfill_enrollment_data_for_site
Added comments as notes to improve updating enrollment data