freelawproject / courts-db

A database of courts, tests and other experiments
BSD 2-Clause "Simplified" License
57 stars 15 forks source link

Handle citation strings #62

Open anseljh opened 1 year ago

anseljh commented 1 year ago

Closes #61

CLAassistant commented 1 year ago

CLA assistant check
All committers have signed the CLA.

mlissner commented 1 year ago

Bill, I think it's our fault that there are conflicts here, since we didn't review it when it was clean. When you get this prioritized, can you plan to fix those for poor @anseljh, who's just trying to be a good contributor and doesn't deserve merge conflicts (not that any of us do, but, him less than us).

anseljh commented 1 year ago

I think there may be a little more to it than that with the failing tests, unfortunately. @flooie, might be good to sync up some time this week.

flooie commented 1 year ago

I’m happy to fix it. I wasn’t sure from the last comment to hold up or what not.

anseljh commented 1 year ago

Thanks, Bill. Here's what I have failing locally now. Ping me if you'd like to talk after poking around a bit.

% python setup.py test
======================================================================
FAIL: test_all_example (tests.DataTest)
Can we extract the correct court id from string and date?
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/anseljh/Code/courts-db/tests.py", line 57, in test_all_example
    self.assertIn(
AssertionError: 'akd' not found in ['akb'] : Failure to find akd in D. Alaska

======================================================================
FAIL: test_all_non_bankruptcy_examples (tests.ExamplesTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/anseljh/Code/courts-db/tests.py", line 137, in test_all_non_bankruptcy_examples
    self.assertIn(court["id"], results, msg=f"Failed {example}")
AssertionError: 'akd' not found in [] : Failed D. Alaska

======================================================================
FAIL: test_bankruptcy_examples (tests.ExamplesTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/anseljh/Code/courts-db/tests.py", line 148, in test_bankruptcy_examples
    self.assertIn(court["id"], results, msg=f"Failed {example}")
AssertionError: 'akb' not found in [] : Failed Bankr. D. Alaska

----------------------------------------------------------------------
Ran 12 tests in 12.674s

FAILED (failures=3)
Test failed: <unittest.runner.TextTestResult run=12 errors=0 failures=3>
error: Test failed: <unittest.runner.TextTestResult run=12 errors=0 failures=3>
flooie commented 1 year ago

This looks like an issue caused by overlapping bankruptcy issues I bet it’ll be easy to fix.

flooie commented 1 year ago

@anseljh - I'm reviewing this PR at the moment.

I've noticed what appear like extra .'s at the end of many regexes.
Del\.\s?Super\.\s?Ct\.\s?\. -> for Del. Super. Ct. with an extra whitespace as well. or more common

something like this for Alakaska, Iowa or Courts ending in numbers for example.

"S\\.\\s?D\\.\\s?Iowa\\." -> "S.D. Iowa" "Ky\\.\\s?Circ\\.\\s?Ct\\.\\s?, 2\\."

These are atleast contributing to the failing tests and I'm not sure if they were intentional or not. Ive found about 20 or so. I assume they aren't but I wanted to double check.

anseljh commented 1 year ago

Yeah, I agree, those don't look right, @flooie.

flooie commented 1 year ago

So, There are a couple of issues, but they all seem a bit minor. Thanks for getting this rolling @anseljh