Open jmesserschmidt1 opened 1 year ago
Thanks for sending this along. I think this would be pretty easy to fix, but our code parsers aren't particularly advanced compared to our opinions parsers.
Do you want to take a stab at it?
Thanks for sending this along. I think this would be pretty easy to fix, but our code parsers aren't particularly advanced compared to our opinions parsers.
Do you want to take a stab at it?
Sure. Not super familiar with the code, but suspect might need a variation on the law_section regex similar to the one that exists for page or volume, like here. This comes up with CFR cites as well (e.g., 17 CFR § 240.10b-5 is currently parsed as 17 CFR 240). So something like (?P<section>\\d+(?:[\\-.:]\\d+){,3})[a-zA-Z]{0,4})
and (?P<section>\\d+(?:[\\-.:]\\d+){,3})[a-zA-Z]{0,4})
I don't know that part of the code very well either, but if you want to do a PR with tests that fixes this, I think we'd probably merge it (and release a new version, if desired).
U.S. code statutes with letters in them appear to be unrecognized. So "18 U.S.C. § 1028" and "18 U.S.C. § 1028(a)" are parsed, but "18 U.S.C. § 1028A" is not. I've tried some variations, but seems to be consistent.