Open stacimc opened 1 year ago
Removing from the milestone so this does not block completion of the project. Since the refreshes ran in 9 days, significantly faster than our desired timeline, we can keep this issue for potential further optimization but it is not necessary to meet project goals.
Suggested Improvement
Once the popularity refresh DAGs have completed an initial run, we will have a better understanding of how long a refresh takes. We should look into ideas for speeding up the process. Some ideas were proposed at the IP stage. More context can be found in this thread, but in short:
[a, b, c, … n]
such that there are about 10k records whose updated_on date is between each interval(a, b), (b, c)
, and so on.Benefit
If the image popularity refresh can be completed comfortably within a month, this should be considered low priority. If it takes much longer, we should increase the priority of this issue.
We would like for a popularity refresh (the process of recalculating popularity constants and updating all records' popularity scores using the new constant) to be able to run on a monthly basis. If the process as implemented takes longer than that, it would be beneficial to optimize the refresh so we can do it more frequently. This allows popularity scores to be more up-to-date.