JustUtahCoders / utahexpungements.org

The frontend code for utahexpungements.org
MIT License
11 stars 19 forks source link

(Explore): Can we parse out whether there is a fine/fee for the case, and also whether it was paid or not. #98

Closed joeldenning closed 4 years ago

joeldenning commented 4 years ago

Some docket pdfs list fines and fees that were required to be paid, and then also the payment dates.

This github issue is to explore whether we can reliably parse out that information from the PDFs.

jamesschlader commented 4 years ago

According to the content one pdf that is thick with accounts, it seems that we could parse out each account to be paid with special attention to ones that look like this: REVENUE DETAIL - TYPE: FINE We could search for "REVENUE DETAIL - TYPE:" to get all of the relevant Accounts. This would, hopefully, distinguish fee/fine/sentence accounts from bail or refund accounts. At any rate, the plan would be section out those parts similarly to how the entire Docket is sectioned out. Then, those sections could be walked looking for the line with "Balance:". The amount on this line would tell us what the payment status is.

I wonder what this parsing function ought to return. I see three ways to go: (1) Parse every account section and return an object with all the relevant detail: name of account and balance seem like obvious field candidates. (2) Like (1) except only return objects for "TYPE:" accounts. (3) Create a field called "accountsPaid", or something like that, which is set to true. Whilst parsing, look for "TYPE:" sections and return false if the balance is > 0.00.

Also, there is TOTAL REVENUE section. If the Balance field here is the one that the BCI uses to make their determination, then we would just need to target the first Balance line after that and we'd be done. That section always appears, based on the cases we have available, so that would be the quickest path to victory. My suspicion is that this section is not the salient one, so I don't think this is a viable option. I mention it here in case I'm wrong about that.

Considerations:

  1. Maybe we want to parse first, test for qualification later. In that case, (1) or (2) but not (3) would be viable options. If we don't mind mixing those tasks, then (3) might be a good option.
  2. If the listing of relevant fines and fees does not always appear in the "REVENUE DETAIL - TYPE:" pattern, then we'll need to use method (1).
  3. Nothing appears in the ACCOUNT SUMMARY section unless it has been mentioned in the PROCEEDINGS section. It seems highly likely that the PROCEEDINGS section will need to be fully parsed anyway, so perhaps we can find a way to parse out fee/fine/sentence detail, including final account balance info, whilst parsing the PROCEEDINGS section.
tuckersamuelsen commented 4 years ago

My advice:

1) The fine/fee section at the top, believe it or not, is NOT helpful. Many accounts with unpaid fines are simply sent to the state debt collection agency, and will show a 0 balance at the top.

2) If there is a balance near the top, then it there is an unpaid fine. If the balance is 0, there still might be an unpaid fine, so that isn't dispositive.

3) There may be a minute entry somewhere in the docket with the words "Office of State Debt Collection". That means the defendant has/had an open fine amount there, and may still need to satisfy it. Those exact words will be very helpful for determining eligibility.

joeldenning commented 4 years ago

Thanks for the info Tucker. Sounds like the ACCOUNT SUMMARY section doesn't always have everything needed in it.

Maybe we want to parse first, test for qualification later

@jamesschlader yeah I think this is a good approach. Let's get as much as we can parsed and then use rules for eligibility qualification in the future phases.