eregs / regulations-parser

Parser for U.S. federal regulations and other regulatory information
Creative Commons Zero v1.0 Universal
36 stars 40 forks source link

37 CFR 1 - notice amendment interpreted as request to delete entire part/root node of tree #379

Closed gregoryfoster closed 7 years ago

gregoryfoster commented 7 years ago

Dev environment: current master [ b2a4c07 ] + PR #378

Running pipeline commands up to eregs --debug fill_with_rules 37 1 results in a failure when processing 77 FR 42149 amendment 1 at regparser/notice/compiler.py:203. This failure was previously encountered by @meriouma and a workaround implemented here.

From what I can tell, the amendment is requesting an update for the entire part's authority citation.

  1. The authority citation for Authority: 35 U.S.C. 2(b)(2).

It looks like the parser is interpreting this as a DELETE change request for the entire part. So this may be an edge case not currently handled by the parser. What's the best approach here?

cmc333333 commented 7 years ago

Hey @gregoryfoster, I've been trying to replicate this over the past few days but haven't had much luck. Do you have some locally-modified documents that haven't made their way into fr-notices yet?

Currently, I'm running (@ 8366bcf8e0d18d20cd1b4a10eb9f531c26d3d18c, current master)

eregs clear
rm -rf .eregs_index   # bug, see #382
eregs pipeline 37 1 somewhere

Then I hit an unorderable types: int() < str() when trying to add § 1.109 as part of 05-461. Happy to start debugging, but if you've gotten further, I'd appreciate the boost.

gregoryfoster commented 7 years ago

Hi @cmc333333. On a freshly pulled, clean repository from current master (8366bcf), I am no longer seeing this issue as reported. Instead it looks like I'm making it to the FR 05-461 § 1.109 issue you mentioned in your comment. Shall we close this issue out and open a new one?

I'm guessing that my index of interim results was somehow compromised, as I rarely execute a clean command since it takes a long time to process 37 CFR 1 from scratch. As a longtime developer, what's your process for quickly performing manual testing and validation of issues, and working across issues which may result in conflicting interim results?

cmc333333 commented 7 years ago

Running clean is certainly the scorched-earth strategy, but it's very easy to explain! A more granular approach is to use the lower-level commands (notably fetch_annual_edition and preprocess_notice) to replace source documents. That should invalidate any intermediate values which depend on those sources (directly or transitively), so the next time you run pipeline, you'll only compute the new content. This approach does require you know which document has changed; if the Federal Register were to alter one of the XML sources after you've loaded it, you wouldn't know until you ran a full clean.

I'll close this ticket in preference of the new issue around 05-461. My first guess there is that 109 needs to be added to a subpart, but we can begin debugging in a separate issue.