appsembler / figures

Reporting and data retrieval app for Open edX
MIT License
44 stars 37 forks source link

Pipeline improvement prerequisites #427

Closed johnbaldwin closed 2 years ago

johnbaldwin commented 2 years ago

This PR contains updates to Figures as prerequisites to adding the second switchable workflow

Each PR has it's own description and context. So please read the changes in calendar order so you see the details to the story.

Why this PR exists and high level story

  1. We need to improve Figures daily metrics collection job (the "daily pipeline") performance for one of our standalone deployments
  2. The existing daily job is working on Tahoe and we don't want to break it
  3. So we add a second daily job workflow
  4. We enable the deployment to choose which workflow to run on a setting in Figures server-vars
  5. After we prove the robustness of the second workflow, we will deprecate and then remove the original workflow

What's the scope of this PR

  1. This PR contains code refactoring and changes needed to support the new workflow
  2. This PR does NOT contain the actual second workflow code
  3. These are broken up into two PRs for readability

What's IN this PR?

  1. Added a Figures ENV_TOKENS flag to choose an override task to kick of the daily Celery top level task or use the default one: https://github.com/appsembler/figures/commit/926cd7c5bb22dd1670ee221de473c071e4aad1d8
  2. Minor code cleanup found along the way that is opportune to include in this PR: https://github.com/appsembler/figures/commit/83687ca54b77d90dfaafaeb2a2301ea352ece5f6
  3. Because it can be confusing to get the specific calendar day (24 hour period starting at 00:00 UTC) for "yesterdays" data collection, added a convenience function to figures.helpers: https://github.com/appsembler/figures/commit/42ce02d575ba9cc6004d4dde91699252daf9f7f0
  4. Found a bug in Figures devsite settings when I needed to create a migration, so fixed that here: https://github.com/appsembler/figures/commit/5a5ff31e03ed6a6ca18e9502effcfb589cd78b13
  5. And the most important part of this PR, updated metrics collection handling to encapsulate and abstract "per enrollment" progress collection here: https://github.com/appsembler/figures/commit/c6693fe12d7873dd94017c438c5ac32cd427f953

It's the last PR that is most important (I repeat myself because it is the most important. There, I've done it again). Anyway, with the new workflow, we will invert the relationship of LearnerCourseGradeMetrics (LCGM) and EnrollmentData where we write first to EnrollmentData (let's call him "ed") and use ed as the primary for data retrieval in the API (we started that already when we introduced ed as a model in figures). LCGM gets pushed back to be a historical record to track the time series per-enrollment performance

But we're not there yet, lots of refactoring to do to get there. What no. 5 commit above does is abstract this relationship from the rest of Figures and isolate the progress retrieval to make Figures safer, faster and more fun to program on

OmarIthawi commented 2 years ago

Thanks John! I'll review it today.