cfpb / regulations-site

(DEPRECATED) Web interface for viewing U.S. federal regulations and other regulatory information
Other
28 stars 43 forks source link

More robust handling of section titles #834

Closed chosak closed 6 years ago

chosak commented 6 years ago

Source regulation RegML seems to contain section titles with inconsistent formatting, for example:

In all of these cases, the section sublabel should be properly extracted as "Something". This change makes the sublabel extraction logic somewhat more robust and adds a few unit tests to verify functionality.

A real example of this kind of thing can be found in this Reg E RegML file.