elifesciences / decision-letter-parser

Parse docx file containing decision letter and author response content and produce output in other formats
MIT License
0 stars 0 forks source link

Remove p tags which only contain whitespace. #112

Closed gnott closed 3 years ago

gnott commented 3 years ago

Re issue https://github.com/elifesciences/issues/issues/6751

Parsing a .docx file resulted in an error when trying to match a table heading in the content. A line in the .docx file contained a single non-breaking space character, which resulted in a <p> </p> blank paragraph, and if it appears immediate after a table heading, then the current table matching pattern fails. Example content,

<p><bold>Author response table 1.</bold> </p>
<p> </p>

This PR adds an additional utils function, remove_empty_p_tags(), to remove any paragraphs which only contain whitespace, assuming these are unintentional and do not add any value to the output at this time.