Closed irishryoon closed 3 years ago
I can take a stab at fixing this! I've been reading through/refactoring funcs_parse.py
anyway.
I'm not able to reproduce this error on your example pdf. Could you try fetching the newest version of the master branch and seeing if this error is still occurring?
FYI I did find another error though, which is that if charges appeared on multiple pages, only the last page of charges would be returned. I will fix this also.
I just tried again with the latest master branch, and I still get the same output. I'm attaching the output csv file here - you can see that under the 'offense' and the 'statute' column, some (but not all) items are concatenated.
For example, the third item under 'offense' currently appears as 'Criminal Attempt - Murder Conspiracy'. This is supposed to be two separate items: 'Criminal Attempt - Murder' and 'Conspiracy'.
Similarly, the second item under 'statute' currently appears as '18 § 901 §§ A 18 § 901 §§ A'. This is supposed to be two separate items '18 § 901 §§ A' and '18 § 901 §§ A'.
Let me know if you're able to reproduce the result
Huh, you're right, it's occurring on the master branch, but not in the version I'm working with that's a few commits ahead, so I apparently fixed the issue even though I didn't think I'd worked on anything that would address it (and I'm still not quite sure where the issue comes from). I'll PR and merge the fix today.
In some dockets, distinct offenses and statutes are concatenated into one string.
For example, when I run 'parse_docket.py' on the attached docket file, it returns the following offenses:['Murder', 'Criminal Attempt - Murder Criminal Attempt - Murder', 'Criminal Attempt - Murder Conspiracy', 'Conspiracy Conspiracy', 'Conspiracy Conspiracy', ... ] Note that some (but not all) distinct offenses appear in the same string, such as 'Criminal Attempt - Murder Conspiracy'.
Similarly, when I run 'parse_docket.py' on the same docket file, it returns the following list of statutes: ['18 § 2502', '18 § 901 §§ A 18 § 901 §§ A', ... ] Again, some statutes have been concatenated into one string. For example, '18 § 901 §§ A 18 § 901 §§ A',
13270.pdf