Parsing a .docx file resulted in an error when trying to match a table heading in the content. A line in the .docx file contained a single non-breaking space character, which resulted in a <p> </p> blank paragraph, and if it appears immediate after a table heading, then the current table matching pattern fails. Example content,
This PR adds an additional utils function, remove_empty_p_tags(), to remove any paragraphs which only contain whitespace, assuming these are unintentional and do not add any value to the output at this time.
Re issue https://github.com/elifesciences/issues/issues/6751
Parsing a
.docx
file resulted in an error when trying to match a table heading in the content. A line in the.docx
file contained a single non-breaking space character, which resulted in a<p> </p>
blank paragraph, and if it appears immediate after a table heading, then the current table matching pattern fails. Example content,This PR adds an additional utils function,
remove_empty_p_tags()
, to remove any paragraphs which only contain whitespace, assuming these are unintentional and do not add any value to the output at this time.