Logic and Proofs includes Unicode nobreak space characters via numerical character reference   using sequences of these with strikethrough styling to achieve a horizontal line used to separate premises from conclusion in multiline presentations of an argument structure. These nbsp's were not getting through the migration tool.
These character references should be understood in any XML processor (as should hex  or simply embedding the unencoded Unicode nbsp character in the UTF8 character stream), but seems cheerio parser has a bug in whitespace normalization so it applies to these nbsp characters, collapsing sequences into a single normal space, at least in XML mode we are using. (HTML mode with entity decoding might rewrite them to the HTML entity reference in output html. But is not automatically defined as a named entity in XML).
This PR preserves them by rewriting any nbsp characters to before the whitespace normalizing parse. These were defined in legacy OLI XML and the migration tool already includes special code to decode them properly when converting text to JSON.
Logic and Proofs includes Unicode nobreak space characters via numerical character reference
 
using sequences of these with strikethrough styling to achieve a horizontal line used to separate premises from conclusion in multiline presentations of an argument structure. These nbsp's were not getting through the migration tool.These character references should be understood in any XML processor (as should hex
 
or simply embedding the unencoded Unicode nbsp character in the UTF8 character stream), but seems cheerio parser has a bug in whitespace normalization so it applies to these nbsp characters, collapsing sequences into a single normal space, at least in XML mode we are using. (HTML mode with entity decoding might rewrite them to the
HTML entity reference in output html. But
is not automatically defined as a named entity in XML).This PR preserves them by rewriting any nbsp characters to
before the whitespace normalizing parse. These were defined in legacy OLI XML and the migration tool already includes special code to decode them properly when converting text to JSON.