37 CFR 1 - FR correction notice interpreted as modifying non-existent appendix

Hey @gregoryfoster, looks like this is being triggered by the first AMDPAR in that notice (which isn't rendering properly on the federalregister's site). If you look at the XML, you'll see

<AMDPAR>
In rule FR Doc. E7-16574, August 22, 2007 (72 FR 46899), make the following corrections:
</AMDPAR>

We can pop that into the amdparser to see how it's read:

In [1]: from lxml import etree

In [2]: from regparser.notice.amdparser import parse_amdpar

In [3]: parse_amdpar(etree.fromstring('<AMDPAR>In rule FR Doc. E7-16574, August 22, 2007 (72 FR 46899), make the following corrections:</AMDPAR>'), [])
Out[3]:
(<Element EREGS_INSTRUCTIONS at 0x7fc67d62af88>,
 [None, 'Appendix:E7', '16574'])

You'll notice that the second value (the resulting "context") points to section 16574 of appendix E7 (clearly not correct). If we dig into the amdparser (specifically regparser.grammar.amdpar:appendix_section -> regparser.grammar.unified:appendix_with_section -> regparser.grammar.atomic.appendix_digit) we can see why. E7 could be an appendix, and 16574 could be a section within an appendix (according to the current rules).

I think the section number is the bit that makes the most sense to twiddle here; let's modify appendix_digit to only accept 1-4 character sections. While I've seen 2-digit appendix sections and can imagine 4-digit ones, 5 seems excessive. We can probably get away with a Regex parser that respects word-boundaries but only allows 1-4 characters. Let us know if that's not enough to get you started!

eregs / regulations-parser

37 CFR 1 - FR correction notice interpreted as modifying non-existent appendix #380