Open albertisfu opened 8 months ago
To ensure a smooth transition of Opinions to ES, we should follow these steps:
I am working on a PR that adjusts cl_send_alerts
and confirm if any changes are required in the templates. It will include a waffle switch, allowing us to control the timing of the transition.
In order to confirm all the alerts are good in terms of syntax. We need to run the #3890 alert validation queries command. I've confirmed the stat_
filter having different values at this stage won't affect syntax, internally it just set the default status to the filter, so we can run it by:
manage.py clean_up_search_alerts --action validate-queries --validation-wait 1
Based on the results from the previous command, we will determine if additional actions are needed to fix alerts.
Once we confirm the alerts are in good shape, it's time to clean up the stat_
filter using:
manage.py clean_up_search_alerts --action clean-up
This should be done ensuring the cl_send_alerts
cron job is not close to be executed. I do not recall the frequency of the alerts cron job. If it runs too frequently, we might consider temporarily stopping it.
After cleaning the alerts, we are prepared to make opinions public and we should change the opinions waffles for everyone, including those for search and alerts.
If the cl_send_alerts cron was stopped, we should re-enable it.
@mlissner Working on tweaking the cl_send_alerts
command to work with ES for sending Search Alerts for opinions, I found some issues related to the Search Alerts Webhooks payload, which are, at the same time, related to the current Search API V3.
The current Solr approach for sending Search Alerts and Webhooks mimics the search results on the frontend. Then, a "grouping" is applied; in fact, it's a collapse
, so that if multiple Opinions for the same cluster_id
match in the results, they are ordered by type asc
, and the first Opinion is chosen as the group representative. Consequently, the alert email and webhook payload contain only one result per cluster.
In ES, we are able to show only one result in the Alert email since we're using a has-child query; it'll return OpinionCluster instead of Opinions.
However, for the Webhook Payload, since it's based on the Search API, it displays Opinions instead of OpinionCluster.
To make Search Alerts webhooks backward compatible, we can match OpinionDocuments directly (once we move to the Search API V4, we can consider using the new nested approach for webhooks too).
However, matching directly OpinionDocuments will return all the opinions for a cluster. So, we have two options:
collapse
parameter, so we can select only the first opinion for each cluster_id
and show only one opinion per cluster in the webhook payload.cluster_id
, which can lead to confusion or errors if users are only expecting one Opinion per cluster.Regardless of the decision we take on the previous issue, there is a second issue I found related to the fields to display in the Search Alerts Webhook Payload.
The current payload is based on the Search API serializer, which displays Opinions instead of clusters and looks as follows:
"results":[
{
"id":97,
"type":"035concurrenceinpart",
"cites":"None",
"court":"Appeals court for the Dirty Dishes",
"judge":"",
"source":"M",
"status":"Non-Precedential",
"scdb_id":"",
"snippet":"\n\n\n Table crime oil side rule about TV perform. Physical threat ball green name. Dog life poor.\nAction seven hard phone. Executive worker operation beyond trip rate. Task practice program travel generation blood growth itself.\nEven those more. Sound movie perform environmental develop history. Forget specific right nothing.\nCity pay president suggest skill machine green. Store box move represent field source actually. Field first box tonight.\nNecessary office let take point direction producti",
"attorney":"",
"caseName":"Moss-Logan v. Gibbs",
"citation":"None",
"court_id":"wxpmd",
"pagerank":"None",
"author_id":97,
"citeCount":0,
"dateFiled":"2024-03-14T00:00:00-07:00",
"docket_id":123,
"lexisCite":"None",
"panel_ids":"None",
"timestamp":"2024-03-20T13:00:09.749000-07:00",
"cluster_id":92,
"dateArgued":"1992-01-01T00:00:00-08:00",
"local_path":"",
"per_curiam":"None",
"suitNature":"",
"court_exact":"wxpmd",
"neutralCite":"None",
"sibling_ids":[
97,
98
],
"absolute_url":"/opinion/92/moss-logan-v-gibbs/",
"dateReargued":"None",
"docketNumber":"4:22-bk-118624",
"download_url":"None",
"status_exact":"Non-Precedential",
"caseNameShort":"Moss-Logan",
"joined_by_ids":"None",
"dateReargumentDenied":"None",
"court_citation_string":"",
"non_participating_judge_ids":"None"
}
]
There are some compatibility issues with fields that we need to address:
Not indexed fields:
caseNameShort
: We decided to stop indexing this field for Opinions.pagerank
: Not available in ES.status_exact
: Removed since it was a duplicate of status
.I think we have some options here:
caseNameShort
, we can get it from the DB.pagerank
, we can set it as blank.status_exact
, we can just get the value from the DB.Indexed only in the parent document:
court_exact
non_participating_judge_ids
source
These fields are only indexed within the OpinionCluster
, so they're not available when matching Opinions directly. We decided to only index them in the parent document since they're not searchable. Thus, we have two options: move them to the OpinionBaseDocument
so they are indexed on each child document (a full reindex of opinions will be required).
The other option is to just retrieve these fields from the DB.
Values changes:
status
type
Here, we can map the values retrieved from ES to the old values we had in Solr.snippet
This field will now only display the text of the opinion as a snippet, which I don't think is an issue.Let me know what you think about these changes.
OK, per our conversation just now, we need to write some code to convert webhook search alerts v1 and search API v3 to use Elastic. We can't make it perfectly like the current API, but we can get it close and provide some time for folks to adjust. I'll write the full plan in a separate issue, but as far as webhook v1 and API v3 go:
caseNameShort
, pagerank
, and status_exact
is fine. court_exact
can be copied from the court
field.non_participating_judges
can be removed.source
can be removed.status
and type
can have a little mapping so that they're unchanged when we switch to Elastic-backed v3.Hopefully with some notice to our customers, this will be fine.
Hey @albertisfu. Please let me know if you need anything from my end. Thank you!
Sure! I'll let you know once we are ready to perform this action. We just need to complete the documentation for version 4 of the Search API before it can be released.
Before launching the ES Opinion Search live, we should clean up the current Opinion Search Alerts queries.
In the new ES Opinions Index, we have renamed the values of the fields
status
andtype
, and removedcaseNameShort
. Therefore, it's necessary to fix any Alerts that utilize these fields.Additionally, we need to modify the
cl_send_alerts
command to query the search alerts using ES. This will serve as a temporary solution before moving Opinion alerts to utilize the Percolator.