freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
536 stars 148 forks source link

Harvard import error (division by zero) #3182

Open sentry-io[bot] opened 1 year ago

sentry-io[bot] commented 1 year ago

ZeroDivisionError: division by zero

Sentry Issue: COURTLISTENER-4YF

ZeroDivisionError: division by zero
(6 additional frame(s) were not displayed)
...
  File "cl/corpus_importer/management/commands/harvard_merge.py", line 1004, in handle
    merge_opinion_clusters(
  File "cl/corpus_importer/management/commands/harvard_merge.py", line 661, in merge_opinion_clusters
    map_and_merge_opinions(opinion_cluster, harvard_data)
  File "cl/corpus_importer/management/commands/harvard_merge.py", line 882, in map_and_merge_opinions
    matches = match_lists(
  File "cl/corpus_importer/utils.py", line 105, in match_lists
    percent_match = compare_documents(
  File "cl/corpus_importer/management/commands/harvard_opinions.py", line 1112, in compare_documents
    100 * (count / min([len(harvard_characters), len(cl_characters)]))
flooie commented 1 year ago

Just to add some context here:

this error is not an import error its a data error on the harvard data side.

take this opinion that failed : law.free.cap.se2d.645/295.12638526.json this is a fast case opinion that has a number of incorrect tagging. The full opinion looks like this <opinion type=\"majority\"/> and this opinion has no corresponding fix in opinionated. I suspect it is just part of the group that didnt get retagged and and added for whatever reason?

flooie commented 1 year ago

I would think we just need to track them and fix the tags by hand if its is just a few ... so far I think its less than 5.

mlissner commented 5 days ago

@flooie is there more to do here?