CouncilDataProject / cdptools_v2

Tools you can use to interact with and run Council Data Project instances.
Other
7 stars 9 forks source link

Idea: Store positional context of `indexed_{item}_term` in `indexed_{item}_term_context` table #73

Closed evamaxfield closed 3 years ago

evamaxfield commented 5 years ago

To provide more context of how the term was used in an event, or bill text, on delivery of results from a plain text search, store context surrounding text near indexed term in an additional table. The benefit to this approach would be that this could be used to optimize the transcript search on the event page.

Currently indexed_event_term table looks as follows:

{
   "685a7cf3-43e4-4386-b109-4df803c5382b": {
      "event_id": "76ae327e-55bd-4a5d-b652-f0db1286291b",
      "term": "hello",
      "value": 0.1234,
      "updated": "..."
   },
   "0ae9b20c-65f8-4163-971b-014f6864959b": {
      "event_id": "00a90b14-0922-44b1-912d-08a40afd2fa4",
      "term": "world",
      "value": 9.8765,
      "updated": "..."
   }
}

Proposed indexed_event_term_context (and similarly, indexed_minutes_item_term_context) table would look as follows:

{
   "2ae89d0b-92a4-4168-823e-dfdf5fe1d501": {
      "indexed_event_term_id": "685a7cf3-43e4-4386-b109-4df803c5382b",
      "data_block_index": 1,
      "context": "hello and welcome to the august 23 meeting of the full council...",
      "updated": "..."
   },
   "fec59705-6131-4807-8df4-2cde34115f29": {
      "indexed_event_term_id": "685a7cf3-43e4-4386-b109-4df803c5382b",
      "data_block_index": 42,
      "context": "... we wish to welcome our presenter, so hello to mr. bob boberson...",
      "updated": "..."
   }
}

This would mean that when we return results from a search against an indexed table, we could return results that look like the following:

search terms: green new deal

City Council
08-12-2019 14:08:00

... I'm here in support of the green new deal as expressed by acacio Cortez at Marquee ...
... do more not just in terms of the green new deal, but also for with regard to the conversation what we can do more ...

---

Sustainability and Transportation Committee
08-06-2019 14:08:00

It can be for us all the green New Deal will not be free and we must be willing to in.
... helped bring the question of the green new deal forward ...
evamaxfield commented 5 years ago

The other option, which is much farther out, is to develop a text summarization model that will run across a transcript to give the highlights or summarization of an entire meeting.

evamaxfield commented 5 years ago

On further thought however, as much as this would optimize the transcript search, it is more than duplicating the cost of the transcript and storing it in the database. Unsure how beneficial that would be. In which case we may need to filter out the terms that get this contextual information added by limited by indexed value.

nniiicc commented 5 years ago

Is the general idea to offer more context for an n-gram or keyword ... So instead of sugar tax alone as a search result you see something like 're-appropriating revenue from the sugar tax is not just against the spirit of the tax, but violates the written legislation...' ?

evamaxfield commented 5 years ago

Is the general idea to offer more context for an n-gram or keyword ... So instead of sugar tax alone as a search result you see something like 're-appropriating revenue from the sugar tax is not just against the spirit of the tax, but violates the written legislation...' ?

Sorta kinda. Until we have some way of generating summarization snippets, (potentially a capstone project), we could store the n (likely 10) words surrounding each index term. So that when a term is queried for, we can at least provide a bit of context on why that event is being returned.

nniiicc commented 5 years ago

Right, thats what I was thinking.