freelawproject / reporters-db

A database of court reporters, tests and other experiments
BSD 2-Clause "Simplified" License
93 stars 34 forks source link

Recognize U.S. Code citations which use sec. or section instead of § #112

Closed bbernicker closed 2 years ago

bbernicker commented 2 years ago

Some courts and statutes use the word "section" or the abbreviation "sec." (and the upper-case Section and Sec.) when citing to the U.S. Code. I updated the regex for the U.S. Code to recognize these alternate forms and also added an example to the list.

See e.g. Whistleblower 21276-13W v. Commissioner, 147 T.C. 121, 147 T.C. No. 4 (2016) ("The targeted taxpayer pleaded guilty to a violation of 18 U.S.C. sec. 371."); McCorkle v. Commissioner, 124 T.C. 56, 57 (2005) ("The order specified that the $2 million was subject to criminal forfeiture pursuant to 18 U.S.C. sec. 982 (2000)."); Kentucky Revised Statutes 403.720 ("'Foreign protective order' means any judgment, decree, or order of protection which is entitled to full faith and credit pursuant to 18 U.S.C. sec. 2265 that was issued on the basis of domestic violence and abuse."); Brown v. Johnson, 387 F.3d 1344, 1347 (11th Cir. 2004) ("On July 15, 2003, the district court denied Brown's motion to amend his complaint because Brown's complaint was subject to dismissal under the PLRA, 28 U.S.C. section 1915."); Wood v. United States, 991 F.2d 915 (1st Cir. 1993) ("Upon filing this certificate, the Attorney General can remove the case to federal court (if it started in state court), substitute the United States as defendant, and, effectively, immunize the employee from any personal liability. 28 U.S.C. Section 2679(d)."); Morana v. Hernando County, CASE NO. 8:09-CV-347-T-17EAJ., 2009 BL 216888 (M.D. Fla. Oct. 7, 2009) ("This case was removed from the Hernando County Circuit Court under 28 U.S.C. Sec. 1332, 28 U.S.C. Sec. 1391, 28 U.S.C. Sec. 1446 and 28 U.S.C. Sec. 1453.").

mlissner commented 2 years ago

This looks basically fine to me. One thing I wonder is whether we should pull out the regex for sections and apply it elsewhere in the file, like we do for some parts of reporter citations.

I also wonder if this should wait on @flooie's work he's doing on https://github.com/freelawproject/eyecite/issues/117, so we can see the difference. (OTOH, @flooie I don't suppose your work is going to be automatically run against this repo, is it? Maybe it should be? Maybe that's easy?)

bbernicker commented 2 years ago

Waiting on @flooie's work makes sense to me. I would also be happy to pull out that regex and apply it elsewhere in the file if you think that makes sense. I am sure that courts use the symbol, the word, and the abbreviation somewhat interchangably with other sources, but I am not sure whether applying it to every regex search for the section symbol would create too many false positives.

bbernicker commented 2 years ago

Just to be clear, would the plan be to immitate the treatment of the paragraph symbol?

In regexes.json we have "paragraph_marker": "(?:P|¶|para?\\.)", We could do something similar like "section_marker": "(?:§|Sec|sec|Section|section)[§|s]?\.?",. That would match single sections and plural (e.g. secs., §§, Sections).

flooie commented 2 years ago

@mlissner no - not automatically, but maybe ... we can actually trigger the same report here. I have it set up to run locally and in a different private repo (for testing purposes) and we could show its effects on eyecite.

mlissner commented 2 years ago

we could show its effects on eyecite.

Seems like an important idea, right? Like, before you release an update to reporters-db that might have a buggy regex, run the benchmark suite.

flooie commented 2 years ago

I'm going to run this branch against our new tests.

flooie commented 2 years ago

I think we want to remove the invisible Mac files from the PR .ds_store etc. @bbernicker

bbernicker commented 2 years ago

@flooie Before I make these edits, how do you feel about @mlissner's suggestion that we pull out the regex for sections and apply it elsewhere in the file.

Instead of using the new regex we wrote in the U.S. Code section, I would immitate our treatment of the paragraph marker and put "section_marker": "(?P\d+) $reporter (?:§|<a href="ec">Ss</a>?(tion)?)[§|s]?.? $law_section" in regexes.json.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/flooie"><img src="https://avatars.githubusercontent.com/u/6464529?v=4" />flooie</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>@flooie Before I make these edits, how do you feel about @mlissner's suggestion that we pull out the regex for sections and apply it elsewhere in the file.</p> <p>Instead of using the new regex we wrote in the U.S. Code section, I would immitate our treatment of the paragraph marker and put "section_marker": "(?P<title>\d+) $reporter (?:§|<a href="ec">Ss</a>?(tion)?)[§|s]?.? $law_section" in regexes.json.</p> </blockquote> <p>We should certainly do that. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/bbernicker"><img src="https://avatars.githubusercontent.com/u/25389306?v=4" />bbernicker</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <blockquote> <blockquote> <p>@flooie Before I make these edits, how do you feel about @mlissner's suggestion that we pull out the regex for sections and apply it elsewhere in the file. Instead of using the new regex we wrote in the U.S. Code section, I would immitate our treatment of the paragraph marker and put "section_marker": "(?P<title>\d+) $reporter (?:§|<a href="ec">Ss</a>?(tion)?)[§|s]?.? $law_section" in regexes.json.</p> </blockquote> <p>We should certainly do that.</p> </blockquote> <p>Great. I will make that chance and resubmit this afternoon. Thanks for the thorough review and sorry for the messy code.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/flooie"><img src="https://avatars.githubusercontent.com/u/6464529?v=4" />flooie</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <p>@bbernicker please, no apologies. This is great. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mlissner"><img src="https://avatars.githubusercontent.com/u/236970?v=4" />mlissner</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <p>Loving the rapid progress here! Thanks all! </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/bbernicker"><img src="https://avatars.githubusercontent.com/u/25389306?v=4" />bbernicker</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <p>Ok this should be ready to go. The only issue is that the first test won't run. The error says "Error: Parameter token or opts.auth is required."</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/flooie"><img src="https://avatars.githubusercontent.com/u/6464529?v=4" />flooie</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <p>Yes, clearly we need to do something to enable more people to use free law bot token. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mlissner"><img src="https://avatars.githubusercontent.com/u/236970?v=4" />mlissner</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>Yes, clearly we need to do something to enable more people to use free law bot token.</p> </blockquote> <p>This is a security thing. It used to be better, but Github tightened this maybe a year ago. The worry was that somebody could issue a PR that changes the code in a way to sniff a secret. They'd issue that PR, the secret would get sniffed, and nobody is happy. Soooo...unless somebody is part of our org, there's no way for them to view secrets in their PRs. Its' SUPER annoying.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/flooie"><img src="https://avatars.githubusercontent.com/u/6464529?v=4" />flooie</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <p>Ok Im going to create a copy of this PR to simply run this code and merge it. - this PR not the duplicate which I will delete after testing the action.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/flooie"><img src="https://avatars.githubusercontent.com/u/6464529?v=4" />flooie</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <h1>The Eyecite Report :eye:</h1> <h2>Gains and Losses</h2> <p>There were 0 gains and 0 losses.</p> <details> <summary>Click here to see details.</summary> | id | Gain | Loss | | ---------- | ------ | ------ | </details> <h2>Time Chart</h2> <p><img src="https://raw.githubusercontent.com/freelawproject/reporters-db/artifacts/126/results/chart.png" alt="image" /></p> <h2>Generated Files</h2> <p><a href="https://raw.githubusercontent.com/freelawproject/reporters-db/artifacts/126/results/original.json">Branch 1 Output</a> <a href="https://raw.githubusercontent.com/freelawproject/reporters-db/artifacts/126/results/update.json">Branch 2 Output</a> <a href="https://raw.githubusercontent.com/freelawproject/reporters-db/artifacts/126/results/output.csv">Full Output CSV </a></p> <!-- Sticky Pull Request Comment --> <p>For posterity </p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>