freelawproject / foresight

Where we discuss and prioritize new features
2 stars 1 forks source link

Add Financial disclosures to search API using Elasticsearch #40

Open albertisfu opened 1 year ago

albertisfu commented 1 year ago

@mlissner Some questions about this one:

mlissner commented 1 year ago
  1. Yes.

  2. Searching by the judge's name isn't super exciting. This will be another one where figuring out nesting will be important. In the front end, we want to be able to get back Financial Disclosure documents as results, but grouping by judge ID would be cool (we could even show their photo and name next to a group of document, for example). At the same time, each document is a series of two-dimensional tables, so you wind up with a structure like:

    Table 1-8, Rows 1-n are FK'ed to... Financial Disclosure document 23, which is FK'ed to... Person 52

    I'm imagining the front end would show a judge's photo on the left, their name as the title, and a series of highlighted data grouped by document year, below.

    In the API, I guess the ideal would be to return document-oriented results, with nested JSON for each table, and a field for the judge's ID. Perhaps:

    {
      "judge_id": 1,
      "gifts": [{ row 1-n }],
      "investments": [{rows 1-n}],
      ...
    }

    I could also be persuaded that the triple nesting above makes more sense, with the judge as the top level result object, with a group of documents nested below and a group of results for those documents below that. Hmm...this might be trickier than I imagined at first. If so, perhaps we do skip this until later....

    @flooie may have some thoughts here too.

  3. See above, but the general idea is that we'd want to have all of the data searchable and available via the API.

albertisfu commented 1 year ago

Thanks, let's see if I understood this properly.

We'd want to index objects in a triple-nested hierarchy in Elasticsearch.

So that we'll index objects as follow:

Positions entries (grouped by Financial disclosure document): Position, Organization

Spousal Income entries (grouped by Financial disclosure document): Date, source

Gifts entries (grouped by Financial disclosure document): Source, Value, description

Investments entries (grouped by Financial disclosure document): Description, Gross Val. Code, Gross Val. Method, Income Code, Income Type, Trans. Type, Trans. Date, Trans. Value

All of these fields should be searchable, right?

Financial disclosures documents (grouped by Judge): There are not many fields that can be searchable here, maybe: Addendum content raw, Report type (filter), Is amended (filter), Year (Filter).

And the search results should display rows with entries that match the search. Front end:

Judge name (Court)
   2020 Financial Disclosure, Report type: Annual Report...
      Gifts:
          Source, Value, Description
     Positions:
        Position, Organization 

A search should match any indexed field and show results accordingly.

For example, if someone searches using the judge's name and it only matches with the financial disclosure documents, we'll display something like:

Judge name (Court)
   2020 Financial Disclosure, Report type: Annual Report...
   2019 Financial Disclosure, Report type: Annual Report...
   2018 Financial Disclosure, Report type: Annual Report...

In this case, if there are no matches for Positions, Gifts, or other types of entries, they won't be displayed.

In the API, considering the previous example, we won't be able to display the results if we follow the given structure:

{
  "judge_id": 1,
  "gifts": [{ row with matched terms }],
  "investments": [{row with matched terms }],
  ...
}

As you suggested, it might be beneficial to also incorporate a tree-level nested structure for the API, something along thus:

{
   "judge_id":1,
   "financial_disclosures":[
      {
         "32237":{ # I'm not sure if showing here the ID or which field would be the best to identify the disclosure
            "year":"2020",
            "report_type":"Anual Report",
            "gifts":[
               {
                  "row with matched terms"
               }
            ],
            "positions":[
               {
                  "row with matched terms"
               }
            ]
         }
      }
   ]
}

Does it seem right?

mlissner commented 1 year ago

Yep, that all seems great.

flooie commented 1 year ago

I do have some thoughts on the API - currently it's toooooooo big and unusable in many circumstances and we havent really implemented a good way to search for lots of things without jumping into the server.

mlissner commented 1 year ago

Well, the idea here is to add the disclosures to the current search API, so that should help with the searching question. When you say it's too big, what do you mean?