Closed bbernicker closed 2 years ago
This looks basically fine to me. One thing I wonder is whether we should pull out the regex for sections and apply it elsewhere in the file, like we do for some parts of reporter citations.
I also wonder if this should wait on @flooie's work he's doing on https://github.com/freelawproject/eyecite/issues/117, so we can see the difference. (OTOH, @flooie I don't suppose your work is going to be automatically run against this repo, is it? Maybe it should be? Maybe that's easy?)
Waiting on @flooie's work makes sense to me. I would also be happy to pull out that regex and apply it elsewhere in the file if you think that makes sense. I am sure that courts use the symbol, the word, and the abbreviation somewhat interchangably with other sources, but I am not sure whether applying it to every regex search for the section symbol would create too many false positives.
Just to be clear, would the plan be to immitate the treatment of the paragraph symbol?
In regexes.json we have "paragraph_marker": "(?:P|¶|para?\\.)",
We could do something similar like "section_marker": "(?:§|Sec|sec|Section|section)[§|s]?\.?",
. That would match single sections and plural (e.g. secs., §§, Sections).
@mlissner no - not automatically, but maybe ... we can actually trigger the same report here. I have it set up to run locally and in a different private repo (for testing purposes) and we could show its effects on eyecite.
we could show its effects on eyecite.
Seems like an important idea, right? Like, before you release an update to reporters-db that might have a buggy regex, run the benchmark suite.
I'm going to run this branch against our new tests.
I think we want to remove the invisible Mac files from the PR .ds_store etc. @bbernicker
@flooie Before I make these edits, how do you feel about @mlissner's suggestion that we pull out the regex for sections and apply it elsewhere in the file.
Instead of using the new regex we wrote in the U.S. Code section, I would immitate our treatment of the paragraph marker and put "section_marker": "(?P
@flooie Before I make these edits, how do you feel about @mlissner's suggestion that we pull out the regex for sections and apply it elsewhere in the file.
Instead of using the new regex we wrote in the U.S. Code section, I would immitate our treatment of the paragraph marker and put "section_marker": "(?P
\d+) $reporter (?:§|Ss?(tion)?)[§|s]?.? $law_section" in regexes.json.
We should certainly do that.
@flooie Before I make these edits, how do you feel about @mlissner's suggestion that we pull out the regex for sections and apply it elsewhere in the file. Instead of using the new regex we wrote in the U.S. Code section, I would immitate our treatment of the paragraph marker and put "section_marker": "(?P
\d+) $reporter (?:§|Ss?(tion)?)[§|s]?.? $law_section" in regexes.json. We should certainly do that.
Great. I will make that chance and resubmit this afternoon. Thanks for the thorough review and sorry for the messy code.
@bbernicker please, no apologies. This is great.
Loving the rapid progress here! Thanks all!
Ok this should be ready to go. The only issue is that the first test won't run. The error says "Error: Parameter token or opts.auth is required."
Yes, clearly we need to do something to enable more people to use free law bot token.
Yes, clearly we need to do something to enable more people to use free law bot token.
This is a security thing. It used to be better, but Github tightened this maybe a year ago. The worry was that somebody could issue a PR that changes the code in a way to sniff a secret. They'd issue that PR, the secret would get sniffed, and nobody is happy. Soooo...unless somebody is part of our org, there's no way for them to view secrets in their PRs. Its' SUPER annoying.
Ok Im going to create a copy of this PR to simply run this code and merge it. - this PR not the duplicate which I will delete after testing the action.
There were 0 gains and 0 losses.
Branch 1 Output Branch 2 Output Full Output CSV
For posterity
Some courts and statutes use the word "section" or the abbreviation "sec." (and the upper-case Section and Sec.) when citing to the U.S. Code. I updated the regex for the U.S. Code to recognize these alternate forms and also added an example to the list.
See e.g. Whistleblower 21276-13W v. Commissioner, 147 T.C. 121, 147 T.C. No. 4 (2016) ("The targeted taxpayer pleaded guilty to a violation of 18 U.S.C. sec. 371."); McCorkle v. Commissioner, 124 T.C. 56, 57 (2005) ("The order specified that the $2 million was subject to criminal forfeiture pursuant to 18 U.S.C. sec. 982 (2000)."); Kentucky Revised Statutes 403.720 ("'Foreign protective order' means any judgment, decree, or order of protection which is entitled to full faith and credit pursuant to 18 U.S.C. sec. 2265 that was issued on the basis of domestic violence and abuse."); Brown v. Johnson, 387 F.3d 1344, 1347 (11th Cir. 2004) ("On July 15, 2003, the district court denied Brown's motion to amend his complaint because Brown's complaint was subject to dismissal under the PLRA, 28 U.S.C. section 1915."); Wood v. United States, 991 F.2d 915 (1st Cir. 1993) ("Upon filing this certificate, the Attorney General can remove the case to federal court (if it started in state court), substitute the United States as defendant, and, effectively, immunize the employee from any personal liability. 28 U.S.C. Section 2679(d)."); Morana v. Hernando County, CASE NO. 8:09-CV-347-T-17EAJ., 2009 BL 216888 (M.D. Fla. Oct. 7, 2009) ("This case was removed from the Hernando County Circuit Court under 28 U.S.C. Sec. 1332, 28 U.S.C. Sec. 1391, 28 U.S.C. Sec. 1446 and 28 U.S.C. Sec. 1453.").