Scirra / Construct-bugs

Public bug report submissions for Construct 3 and Construct Animate. Please read the guidelines then click the 'Issues' tab to get started.
https://www.construct.net
104 stars 83 forks source link

Text wrapping: per character also wraps punctuation for CJK #8088

Closed alastajj closed 23 hours ago

alastajj commented 1 week ago

Problem description

When doing Chinese/Japanese/Korean translations for our games, they require text to be wrapped per character, instead of wrapping per word like English does. The Text object has a setting for this.

But, with these languages the per character wrapping is suppose to exclude punctuation. Characters such as "。 、 ; : ? ! ( ) 《 》【 】" should not really be counted as a character and put at the start of a new line.

(See images below)

The line breaks should only occur with characters between the Unicode ranges: 4E00 - 9FA5 or 19968 - 40869

If it can't be fixed, perhaps an additional setting for the Text's wrapping option.

Attach a .c3p

CJK wrapping.zip

Steps to reproduce

  1. Look at the layout view, and the text object's settings

Observed result

image

Notice the dot is at the start of a new line, this is incorrect wrapping.

Expected result

image

It should be with a character like here

More details

Affected browsers/platforms:

First affected release:

System details

View details PASTE HERE
XHXIAIEIN commented 1 week ago

I think this is probably difficult, because C3 uses canvas to draw text, and cannot use CSS properties. If there was a line-break attribute in HTML that could do this. However, there is no support for this in CanvasRenderingContext2D. https://developer.mozilla.org/en-US/docs/Web/CSS/line-break

As a temporary solution, Use stupid methods to do it - let the translator add the line breaks manually. I did the same when I translated C3.

AshleyScirra commented 23 hours ago

This sounds simple but quickly runs in to all sorts of serious complications. Accurately identifying the full unicode set of CJK punctuation characters is fairly difficult. Then there are all sorts of edge cases. It seems simple enough that A。 should wrap as a unit, but what about A???, or combinations of different punctuation like A。《》? It's not really clear to me what the intention is for these cases, or what the rules to follow are. It's easy enough to say "stick punctuation to the previous character", but then under that rule, A《B》 is allowed to wrap with A《 at the end of the line, which seems incorrect, much the way that in English I'd expect A"B" to wrap as A and "B" rather than A" and B". When you take in to account all these rules it ends up with essentially a new word wrap mode as complex as the whitespace-delimited variant. It's another good example of something that ends up being much more complicated than expected, which maintaining a custom text layout engine in JavaScript definitely is. So I've come up with an implementation based on my own best guesses to how this should be handled for the next beta.

Technically doing this to 'character' word wrap mode is a breaking change in case anyone depends on CJK punctuation wrapping. This sounds unlikely, but there may be projects out there with something like 。。。。。。。。。 for emphasis or decoration and expecting it to character wrap, whereas under the rules I've ended up implementing, that would wrap like a single large word which could affect how the project looks. I don't know if that kind of thing is generally used in practice though. I think this warrants shipping it as a separate word wrap mode, so it will appear as a new separate "CJK" mode in the next beta.