freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
538 stars 148 forks source link

Clean up for Opinion Search Alerts. #3632

Open albertisfu opened 8 months ago

albertisfu commented 8 months ago

Before launching the ES Opinion Search live, we should clean up the current Opinion Search Alerts queries.

In the new ES Opinions Index, we have renamed the values of the fields status and type, and removed caseNameShort. Therefore, it's necessary to fix any Alerts that utilize these fields.

Additionally, we need to modify the cl_send_alerts command to query the search alerts using ES. This will serve as a temporary solution before moving Opinion alerts to utilize the Percolator.

albertisfu commented 6 months ago

To ensure a smooth transition of Opinions to ES, we should follow these steps:

albertisfu commented 6 months ago

@mlissner Working on tweaking the cl_send_alerts command to work with ES for sending Search Alerts for opinions, I found some issues related to the Search Alerts Webhooks payload, which are, at the same time, related to the current Search API V3.

Collapsed Opinion Documents in the Current Webhook Payload

The current Solr approach for sending Search Alerts and Webhooks mimics the search results on the frontend. Then, a "grouping" is applied; in fact, it's a collapse, so that if multiple Opinions for the same cluster_id match in the results, they are ordered by type asc, and the first Opinion is chosen as the group representative. Consequently, the alert email and webhook payload contain only one result per cluster.

In ES, we are able to show only one result in the Alert email since we're using a has-child query; it'll return OpinionCluster instead of Opinions.

However, for the Webhook Payload, since it's based on the Search API, it displays Opinions instead of OpinionCluster.

To make Search Alerts webhooks backward compatible, we can match OpinionDocuments directly (once we move to the Search API V4, we can consider using the new nested approach for webhooks too).

However, matching directly OpinionDocuments will return all the opinions for a cluster. So, we have two options:

  1. I investigated and ES also has also the collapse parameter, so we can select only the first opinion for each cluster_id and show only one opinion per cluster in the webhook payload.
  2. We could avoid collapsing Opinion results and send all the opinions matched as the webhook payload; however, that means sending multiple Opinions for each cluster_id, which can lead to confusion or errors if users are only expecting one Opinion per cluster.

Search Alerts Webhooks API Fields

Regardless of the decision we take on the previous issue, there is a second issue I found related to the fields to display in the Search Alerts Webhook Payload.

The current payload is based on the Search API serializer, which displays Opinions instead of clusters and looks as follows:

"results":[
         {
            "id":97,
            "type":"035concurrenceinpart",
            "cites":"None",
            "court":"Appeals court for the Dirty Dishes",
            "judge":"",
            "source":"M",
            "status":"Non-Precedential",
            "scdb_id":"",
            "snippet":"\n\n\n    Table crime oil side rule about TV perform. Physical threat ball green name. Dog life poor.\nAction seven hard phone. Executive worker operation beyond trip rate. Task practice program travel generation blood growth itself.\nEven those more. Sound movie perform environmental develop history. Forget specific right nothing.\nCity pay president suggest skill machine green. Store box move represent field source actually. Field first box tonight.\nNecessary office let take point direction producti",
            "attorney":"",
            "caseName":"Moss-Logan v. Gibbs",
            "citation":"None",
            "court_id":"wxpmd",
            "pagerank":"None",
            "author_id":97,
            "citeCount":0,
            "dateFiled":"2024-03-14T00:00:00-07:00",
            "docket_id":123,
            "lexisCite":"None",
            "panel_ids":"None",
            "timestamp":"2024-03-20T13:00:09.749000-07:00",
            "cluster_id":92,
            "dateArgued":"1992-01-01T00:00:00-08:00",
            "local_path":"",
            "per_curiam":"None",
            "suitNature":"",
            "court_exact":"wxpmd",
            "neutralCite":"None",
            "sibling_ids":[
               97,
               98
            ],
            "absolute_url":"/opinion/92/moss-logan-v-gibbs/",
            "dateReargued":"None",
            "docketNumber":"4:22-bk-118624",
            "download_url":"None",
            "status_exact":"Non-Precedential",
            "caseNameShort":"Moss-Logan",
            "joined_by_ids":"None",
            "dateReargumentDenied":"None",
            "court_citation_string":"",
            "non_participating_judge_ids":"None"
         }
      ]

There are some compatibility issues with fields that we need to address:

Not indexed fields:

I think we have some options here:

Indexed only in the parent document:

Values changes:

Let me know what you think about these changes.

mlissner commented 6 months ago

OK, per our conversation just now, we need to write some code to convert webhook search alerts v1 and search API v3 to use Elastic. We can't make it perfectly like the current API, but we can get it close and provide some time for folks to adjust. I'll write the full plan in a separate issue, but as far as webhook v1 and API v3 go:

Hopefully with some notice to our customers, this will be fine.

blancoramiro commented 2 months ago

Hey @albertisfu. Please let me know if you need anything from my end. Thank you!

albertisfu commented 2 months ago

Sure! I'll let you know once we are ready to perform this action. We just need to complete the documentation for version 4 of the Search API before it can be released.