feat(alert): Refines send recap alerts command

ERosendo commented 1 week ago

This PR addresses issues described in https://github.com/freelawproject/courtlistener/issues/4646#issuecomment-2466985751.

Key changes:

Refines checks to prevent early termination of the recap_document_sweep reindex process.
Enforces upper bound on time estimate calculations.

To ensure a clean slate for the next execution and prevent issues from previous failed attempts, we should run the following Python script in production to remove any residual keys from Redis:

from cl.lib.redis_utils import get_redis_interface

r = get_redis_interface("CACHE")

r.delete("alert_sweep:main_re_index_completed")
r.delete("alert_sweep:rd_re_index_completed")
r.delete("alert_sweep:query_date")
r.delete("alert_sweep:task_id")

mlissner commented 1 week ago

it seems that the root cause of this issue is that Elasticsearch is taking too long to complete the re-indexing process:

You might have missed it, Alberto, but we had some issues with the indexes while you were out, and they were a bit degraded. It's possible that could be part of the cause here. If so, hopefully that's not something we'd need to deal with all the time.

Why was it indexing 3M objects? Isn't that kind of weird? I thought it'd be more like 100k/day?

albertisfu commented 1 week ago

Why was it indexing 3M objects? Isn't that kind of weird? I thought it'd be more like 100k/day?

Yes, the reason is that even if we receive around 100K dockets or RDs per day, we also need to re-index related documents to match cross-object alerts that could involve the docket and any of its RECAPDocuments.

So we re-index:

Dockets added/modified in the day
RECAPDocuments with parents added/modified in the day
RECAPDocuments added/modified in the day
Dockets that are parents of RECAPDocuments added/modified in the day

I believe the query causing the required number of documents to be re-indexed to grow significantly is: RECAPDocuments with parents added/modified on the same day.

In this case, it might be matching dockets with thousands of documents that also need to be indexed.

I wanted to replicate the queries using the dates shown in the logs to confirm those numbers, but unfortunately, the URL that Ramiro shared to access Kibana or the Elasticsearch cluster endpoint is not working due to timeout issues.

mlissner commented 1 week ago

With the tweaks in this PR, we’ll see if the command can complete in less than a day. But to prevent any issues it might also be worth it to include an additional Redis key to indicate if another instance of the command is already running. This way, if another instance is invoked, it can terminate early to avoid issues.

Seems like a good idea. I guess we can monitor and see if we need this.

Merging, thank you!

mlissner commented 1 week ago

Ran the redis commands. We should get a deployment shortly. Do we just need to check on the cronjob tomorrow, then, I assume?

albertisfu commented 1 week ago

Do we just need to check on the cronjob tomorrow, then, I assume?

Yes, we could also re-check the ES query to confirm that the reindexing task is running.

GET _tasks?detailed=true&actions=*reindex

freelawproject / courtlistener

feat(alert): Refines send recap alerts command #4672