Closed gjblajian closed 1 year ago
html encoded values vs unicode values we have seen include ', " and & are coming across in the text elements
Steps to reproduce the behavior:
import { readDocx } from 'docx-wasm' const parsedDoc = readDocx(buf) console.log(parsedDoc:, parsedDoc)
parsedDoc:
would prefer to see the values in the output as unicode e.g. ← since many special characters do not actually have html entity translations (MS Word's start and end double quotes are different unicode entities [U+201C, U+201D])
html encoded values ', " and &
Corporate Arbitration.docx
Note that this bug does not ALWAYS happen for the quot or apos but seems to happen consistently for amp.
I'll check it later.
@gjblajian Please try 0.0.276-rc33
thank you, @bokuweb
Describe the bug
html encoded values vs unicode values we have seen include ', " and & are coming across in the text elements
Reproduced step
Steps to reproduce the behavior:
import { readDocx } from 'docx-wasm' const parsedDoc = readDocx(buf) console.log(
parsedDoc:
, parsedDoc)Expected behavior
would prefer to see the values in the output as unicode e.g. ← since many special characters do not actually have html entity translations (MS Word's start and end double quotes are different unicode entities [U+201C, U+201D])
Actual behavior
html encoded values ', " and &
Screenshots
Corporate Arbitration.docx
Desktop (please complete the following information)