BCcampus / pressbooks-openstax-import

[UNMAINTAINED] Pressbooks Plugin for OpenStax Textbook Import
GNU General Public License v3.0
5 stars 0 forks source link

Line breaks are added in perplexing ways #66

Closed greatislander closed 6 years ago

greatislander commented 6 years ago

Description

Importing this book: https://cnx.org/contents/HLT_qvJK@6.2:wsOQ6HtH@8/Preface-to-Pfeiffer-Applied-Pr

There's some weird behaviour in terms of line breaks being added within paragraphs.

Expected behaviour

Markup should be relatively consistent.

Actual behaviour

Line breaks are added.

Steps to reproduce the problem

Before:

<p id="id84958">This is a "first course" in the sense that it presumes no previous course in probability. The units are
modules taken from the unpublished text: Paul E. Pfeiffer, ELEMENTS OF APPLIED PROBABILITY,
USING MATLAB. The units are numbered as they appear in the text, although of course they may
be used in any desired order. For those who wish to use the order of the text, an outline is
provided, with indication of which modules contain the material.</p>

(Note that there appear to be some line breaks within the paragraph content, but they aren't rendered because they aren't actually <br /> tags.)

After:

<p id="id84958">This is a “first course” in the sense that it presumes no previous course in probability. The units are<br>
modules taken from the unpublished text: Paul E. Pfeiffer, ELEMENTS OF APPLIED PROBABILITY,<br>
USING MATLAB. The units are numbered as they appear in the text, although of course they may<br>
be used in any desired order. For those who wish to use the order of the text, an outline is<br>
provided, with indication of which modules contain the material.</p>

I expect this is some ghastly wpautop() behaviour.

System Information

dac514 commented 6 years ago

Idea:

http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/tidy.htm

If the incoming content is valid HTML then it could be pre-processed with tidy = -1

This would remove as many whitespace characters and linebreaks as possible, thereby reducing the amount of WordPress touching it?

bdolor commented 6 years ago

fixed via https://github.com/BCcampus/pressbooks-openstax-import/commit/cb1231f8e66cf6d35971edd7b295715dc28a7817