Closed vrajmohan closed 6 years ago
@cmc333333 correctly identified the cause as https://github.com/eregs/regulations-site/blob/master/regulations/generator/layers/layers_applier.py#L32. This does not happen on Python 2 but happens in Python3 as a result of https://docs.python.org/3.6/library/html.entities.html#html.entities.html5.
Note that the trailing semicolon is included in the name (e.g. 'gt;'), however some of the names are accepted by the standard even without the semicolon: in this case the name is present with and without the ';'
A possible useful observation - in https://www.fec.gov/regulations/111-1/2018-annual-111#111-1, the 1st citation (52 U.S.C. 30101) is affected while the 2nd one (26 U.S.C. 9001) is not.
The 1st citation is affected but not the 2nd one because the unescaping is invoked multiple times. It is invoked for each replacement, not just on the replacement text, but on the entire node text at https://github.com/eregs/regulations-site/blob/master/regulations/generator/layers/layers_applier.py#L49 and https://github.com/eregs/regulations-site/blob/master/regulations/generator/layers/layers_applier.py#L58.
When I removed the calls to self.unescape_text()
, the FEC regulations looked fine at a cursory glance and the external citations worked properly.
It would be nice to understand what is meant by the comment https://github.com/eregs/regulations-site/blob/master/regulations/generator/layers/layers_applier.py#L33.
Thanks for fixing this quickly!
Thanks for finding it @vrajmohan ! We had to review a bunch of history going back to CFPB (https://github.com/cfpb/regulations-site/pull/69), but the problem really stemmed from #42, in which we sped up layer replacement, but forgot to remove the unescape logic from the previous implementation. Props to @tadhg-ohiggins for the fix!
Query parameters of the form
§ion=foo
seem to be prematurely and incorrectly interpreted as the section entity reference§
and end up as§ion=foo
! E.g.,26 U.S.C. Chapter 53
on the page https://regulations.atf.gov/479-22/2016-00192#479-2252 U.S.C. 30101
on the page https://www.fec.gov/regulations/111-1/2018-annual-111#111-1