646e62 / case-brief

Generates a FIRAC-style case brief from a reported decision
GNU General Public License v3.0
2 stars 0 forks source link

Pronoun confusion #25

Closed 646e62 closed 1 year ago

646e62 commented 1 year ago

When dealing with small text blocks the local summarizer and subsequently GPT-3 can get confused about who words like "they", "it" and so forth refer to. GPT-3 tends to fill in the blanks that the local summarizer creates. Specific prompts could address this, but it may be possible to do it through a rules-based approach locally.

646e62 commented 1 year ago

One approach to this problem may be to train a small NLP model to detect procedural history. This would ideally be able to classify a decision as one of the following types:

  1. Supreme Court
  2. Jurisdictional appeal court
  3. Jurisdictional superior court
  4. Tribunal superior court
  5. First instance

If the function is able to detect procedural history elements within an appellate decision (eg, a paragraph that says "On appeal to x Court of Appeal, ..."), it should be able to determine procedural history using a rules-based approach.. Having this history may help contextualize who "they" and "it" are in certain instances.

646e62 commented 1 year ago

The problem seems largely attributable to small values. For now, this is resolved through a dynamic approach to summarization. Where a key's value is too small, the summarizer will not redact much or anything from it before sending it to GPT.

For automatic analysis through the local categorizer, the functions will need to be further fine-tuned to err on the side of adding more text to keys than less. Because GPT-3.5 is still very inexpensive to run through the API, keeping key values small is less a priority than it was when the functions had to use davinci.