cog1to / st-ligatures

Patches for ST (suckless terminal) that add support for ligatures drawing
53 stars 5 forks source link

Hidden emoji modifiers #36

Closed veltza closed 10 months ago

veltza commented 10 months ago

Sorry to bother you all the time, but there is one more issue that needs to be fixed.

The problem is that emoji modifiers are not processed correctly, which breaks the rendering of the glyphs.

For example, when you look at the emoji-test.txt file, you'll notice that skin color modifiers are missing:

1

But when you select the text, they appear:

2

Here is another example where the second regional indicator symbol is missing, but when you select the first symbol, the second one appears:

4

I read the HarfBuzz manual and found that this behavior is caused by the default clustering level 0. But it looks like level 1 is what we need:

Level 0 is the default.

The distinguishing feature of level 0 behavior is that, at the beginning of processing the buffer, all code points that are categorized as marks, modifier symbols, or Emoji extended pictographic modifiers, as well as the Zero Width Joiner and Zero Width Non-Joiner code points, are assigned the cluster value of the closest preceding code point from different category.

In essence, whenever a base character is followed by a mark character or a sequence of mark characters, those marks are reassigned to the same initial cluster value as the base character. This reassignment is referred to as "merging" the affected clusters. This behavior is based on the Grapheme Cluster Boundary specification in Unicode Technical Report 29.

This cluster level is suitable for code that likes to use HarfBuzz cluster values as an approximation of the Unicode Grapheme Cluster Boundaries as well.

Client programs can specify level 0 behavior for a buffer by setting its cluster_level to HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES.

Level 1 tweaks the old behavior slightly to produce better results. Therefore, level 1 clustering is recommended for code that is not required to implement backward compatibility with the old HarfBuzz.

Level 1 differs from level 0 by not merging the clusters of marks and other modifier code points with the preceding "base" code point's cluster. By preserving the separate cluster values of these marks and modifier code points, script shapers can perform additional operations that might lead to improved results (for example, coloring mark glyphs differently than their base).

Client programs can specify level 1 behavior for a buffer by setting its cluster_level to HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS.

So, I tested the clustering level 1 and it indeed fixed the emoji issue and didn't seem to affect the ligatures at all:

    hb_buffer_t *buffer = hb_buffer_create();
    hb_buffer_set_direction(buffer, HB_DIRECTION_LTR);
+   hb_buffer_set_cluster_level(buffer, HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS);

I'm not familiar with HarfBuzz, so do you think this setting has any side effects that should be taken care of?

cog1to commented 10 months ago

It makes sense, since st can't do ZWJ and such anyway. Ligatures and other stuff seems unaffected by this.

cog1to commented 10 months ago

I made a br-buffer-overflow-and-emoji-fixes branch with the changes for both this and #35, can you check the patches that are relevant to your build?

veltza commented 10 months ago

I tested st-ligatures-20240105-0.9.diff and st-ligatures-boxdraw-20240105-0.9.diff and they both fixed the both issues. The rest I checked visually and they looked good to me. So I think they are ready for production.