diegomura / react-pdf

📄 Create PDF files using React
https://react-pdf.org
MIT License
14.73k stars 1.17k forks source link

Inconsistent hyphenation / word-wrapping. #1238

Open nwsm opened 3 years ago

nwsm commented 3 years ago

Describe the bug In some scenarios, long words are not broken up and hyphenated appropriately, leading to overflow of flex containers.

To Reproduce https://react-pdf.org/repl?code=3187b0760ce02e004d09e01b02984605e181bc05031804e200ee017367be3002602584003a2021bce40e40198a007bb00d257cdd90f0022b40b260516b80e45880ca01200118802d59010e011818f382112d6a8593565546089e01d40b3061d89386d6a805b6604039ad1807002b21bb25002fa0be2822392e0d8333351d181fbe800318745506968ebea1b1a9b99aa2002bb2387e044e2d68242c188830060c0005002506001f0765000f33703957b218140f508c00c002b31fb21c1412323a16020a04001d06b53c0454ca8a80c01aad323132eafae6ea36e2a1f4cd89f9e5f5ca2dcad6f6ec53cd8020103000aa8926cc083513803003d2827893678024e30b7b10a680c059c2e573b97d563b3f8633158f878321d0b85838924945a26924999a23e6b0db7dee44a449241d4e20e856000b001588b45a28144b2552e94cb6572f942b154ae574b616486602e9388d4c0b5971d6bc712cfc4fd1e3afc21bde78b64137e267fa326c3c844f42150b5752b9543d7a2bd4ee675aee84fb79b31cef27baa92ebf6eb61f498fe1b156f649a392184d51c33d0a670fc79900170b45e2c974b65f2c572b55eacd76b75fac372b0c66cb75b6df6c773b5deecf77b7dfec0f075d8f7471d3e835c7b5d3587cd16c4d870d46e34467400dc380dce0004ac8660c96662001883dc6da023b4862d180c27a6b9c100000

Expected behavior Words should always be hyphenated effectively by default.

Desktop (please complete the following information):

I assume the issue here is in the hyphen package, but I could be wrong and am also open to any suggestions to avoid this. It appears to have something to do with the hyphening algorithm dealing with long strings of repeated characters.

diegomura commented 3 years ago

Yes. This is definitely an edge case in which hyphen is expecting words and clearly that's not what it expects. This makes it probably return an invalid break. Wonder if this is something that happens in real world applications. Will keep it open

GBrachetta commented 3 years ago

I actually have a real world case. Rendering long paragraphs with a monospaced font (FiraCode) the justification is not completely reliable and there are overflows (of max 1 character).

amaljosea commented 3 years ago

In our case, we need to render UUID strings and it is not wrapping as expected

Image:

Screenshot 2021-05-18 at 3 13 14 PM

REPL: Link (Try editing the code if it shows error 'You are not rendering a valid document')

diegomura commented 3 years ago

Have you tried setting a custom hyphenation callback? Otherwise, you can add soft-hyphens chars on the possible breaking points of your text. That will bypass the default hyphenation engine and take those as valid places. Check here

amaljosea commented 3 years ago

@diegomura

Thanks 😊

SatorCube commented 3 years ago

Another real-world case where I am seeing this issue is with URLs. They frequently have UUID-like portions.

nbouvrette commented 1 year ago

We are having similar issues where we have user-generated content and random text overflowing from our PDFs rather than just having normal overflow makes our PDF look buggy. It would be nice to support the breakWord CSS styles.

In the meantime, I got this workaround using the custom hyphenation callback as suggested by @diegomura:

const hyphenationCallback = (word) => {
  // Return each letter of the word in an array
  return word.split("")
}

Font.registerHyphenationCallback(hyphenationCallback);

Live demo here

lexxwork commented 8 months ago

yet another example of unexpected hyphenation: expect real

looks like hyphenation happens in other code place

diegomura commented 8 months ago

@lexxwork looks like you are kind of forcing it on that example

<Text style={styles.text}>
    <Text>A line of text in </Text>
    <Text style={styles.redText}>a par</Text>
    <Text>agraph.</Text>
</Text>
lexxwork commented 8 months ago

Investigating the code i see it might work with full text as 'A line of text in a paragraph.', preserving anchors of text chunks and appropriate styles for them. At some point text is divided to syllables and maybe there the logic is broken. In my screenshots a part of text looses its style (derives other) additionally. Anyway this is another example how hyphenationCallback is not involved.