eregs / regulations-parser

Parser for U.S. federal regulations and other regulatory information
Creative Commons Zero v1.0 Universal
36 stars 40 forks source link

[WIP] 37 CFR 1 appendix titles are in WHED tags #365

Closed gregoryfoster closed 7 years ago

gregoryfoster commented 7 years ago

@cmc333333's suggested change gets the annual_editions command much farther along. Calling this a WIP based on the intimation that there may be more to fix elsewhere.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 91.938% when pulling 6cd6fad610cd670656cab939e120d51c7831ff3f on gregoryfoster:37_cfr_1_appendix_title into 3c0e8d2867720f6c8da068c5d2358fcdc2972c61 on eregs:master.

cmc333333 commented 7 years ago

Thanks @gregoryfoster! I tried to fix this with [ad320894db1663850a2b45c12373c4e4d513c17c] earlier, but looks like I didn't get everything. How do you think we should deduplicate this logic? Should we replace the one remaining call to get_app_title with tree_utils.get_node_text(appendix_headers(node)[0]) or should we just simplify get_app_title to that, move it into regparser.tree.gpo_cfr.appendices, and call that logic everywhere?

gregoryfoster commented 7 years ago

Thanks for this, @cmc333333. I had not seen your earlier commit and am now watching the repo for every change. You can see I massacred this PR when merging in the new changes from Master into my working branch, so I decided to start over to keep the new commit(s) narrowly focused (#367).

You'll see I took the latter approach you recommended, preserving appendix_headers, adding a new function get_appendix_title, and rewriting references to the deprecated/deleted get_app_title. This gets the 37 CFR 1 annual_editions command farther along to the next parsing error.