cambiatus / backend

Cambiatus GraphQL API
GNU Affero General Public License v3.0
21 stars 18 forks source link

Refactor monthly digest email worker #294

Closed MatheusBuss closed 2 years ago

MatheusBuss commented 2 years ago

When we implemented generalized workers for email on #276 we left behind a bug.

We spawn a worker for an entire community. Which means that, if a email for a single user in that community fails, the entire worker fails. This also can cause some users to receive the same message multiple times, since the worker will retry sending emails to the entire community up to 3 times.

We should change this behavior to spawn a worker for every member, instead of a worker for every community.

MatheusBuss commented 2 years ago

@lucca65 looking into it I realized it may not be as simple as I thought. Since we use the Cron plugin from Oban to spawn the worker I don't see how we could make a loop before spawning the worker. So I thought of two ways to resolve this issue.

  1. We rename the current worker to MonthlyDigestSpawnerWorker and make this worker loop over communities and members and then spawn individual workers for each member
  2. Instead of having a worker spawn other workers we could simply schedule a function to trigger every month and achieve the same result. But for this we'd need another dependency to do the scheduling.

I'm leaning towards the second one, since this trigger does not need to be a worker, but this way we add another dependency to our project and another potential point of failure.

What do you think @lucca65, which do you think we should go for? Or do you see something I don't?

lucca65 commented 2 years ago

When thinking about this, i've also envisioned doing the second route.

Monthly Digest Worker actually spans other workers that do the email sending individually. The schedule itself is trivial, simply creating workers on a new queue. Oban will handle the rest, making the necessary intervals internally

lucca65 commented 2 years ago

I'm not sure if i follow this dependency problem you mentioned. can you explain in more detail?

MatheusBuss commented 2 years ago

It's just that it's something else that we need to get working correctly and maintaining, instead of using something that's already working for us.

Shouldn't be a big problem though.

lucca65 commented 2 years ago

hm, not sure what you mean, its definitely not working 😅😅

I mean we did sent 30% of our monthly quota in a single night. Having lots of workers is no trouble, in fact we use workers to have control over large volumes of data processing and still have it be manageable.

With this strategy this problem would still happening but by querying the params we've sent to the worker, it's status and error we could fix the one user with a problem, instead of having thousands of wrong emails.

I think it's a good route to have more controlled errors! even if it means more queues and workers

Check out this demo with very little data that obans website uses https://getoban.pro/oban

MatheusBuss commented 2 years ago

I think I didn't get my message across. I meant that the worker scheduling is working. So it would be ok to use this scheduling to create a worker that creates workers.

The other alternative I thought was to schedule a function (without a worker) that creates workers. For this one I believe we would need another dependency.

In both cases we'd have to refactor creating workers for every user on the mailing list.