arklumpus / VectSharp

A light library for C# vector graphics
GNU Lesser General Public License v3.0
226 stars 22 forks source link

Inconsistent emoji resolution in markdown #63

Closed manutiedra closed 2 months ago

manutiedra commented 6 months ago

First let me thank you for this awesome piece of work. I started using it this week and it has been a lovely journey.

I have the following markdown:

Horse :racehorse:
NO :x:
Sad :sob:

When I convert that markdown using VectSharp.Markdown, I got different results (using the default font) depending if I use SaveAsImage or SaveAsSVG.

This is my code:

MarkdownPipelineBuilder pipelineBuilder = new MarkdownPipelineBuilder()
        .UseAdvancedExtensions()
        .UseEmojiAndSmiley();
MarkdownDocument markdownDocument = Markdig.Markdown.Parse(markdownSource, pipelineBuilder.Build());
MarkdownRenderer renderer = new MarkdownRenderer() {
    Margins = new Margins(0, 0, 0, 0),
};
Page pag = renderer.RenderSinglePage(markdownDocument, 395, out Dictionary<string, string> linkDestinations);
pag.SaveAsImage("markdown.png", OutputFormats.PNG, 2);
pag.SaveAsSVG("markdown.svg");

image

Why is that happening? I was expecting that the emojis that are working will work in both output formats

arklumpus commented 6 months ago

Hi, I'm glad you like the library!

This is interesting, there seem to be a few different things happening (and I am actually surprised that you get any emoji at all, honestly).

First, the cross is a single char ('\u274C'), while the horse and the sad face are actually represented by two chars: ("\uD83D\uDC0E" and "\uD83D\uDE2D"). Try entering "🐎".Length in the C# Interactive prompt and you will see that it prints 2.

When you use the SaveAsImage method, the library scans the strings you are drawing char by char, and tries to find the glyph corresponding to each char in the font file. Since the font file does not contain a glyph for any of those characters, it shows a box instead (for each char: so two boxes for horse and sad face, and only one for the cross).

When you use SaveAsSVG, the default behaviour is to embed a subsetted font in the SVG file; for some reason (I believe it has to do with the 0xDXXX chars being in a different range), when the subsetted font is created, a glyph is included for the cross, but not for any of the characters that make up the horse and the sad face. Since the SVG has a fallback to the sans-serif font, the browser realises that the embedded font does not provide a glyph for those characters and uses its default font. This does not happen for the cross, because an embedded glyph is present for that.

If you really want to support emojis, I have a fun workaround (which could be eventually developed into a separate VectSharp.Markdown.Emoji package). The general idea is:

Full code below (create a project subfolder called Emoji and add racehorse.svg, sob.svg, and x.svg as embedded resources):

SaveAsImage: markdown

SaveAsSVG: markdown

using VectSharp.Raster.ImageSharp;
using VectSharp;
using Markdig;
using VectSharp.SVG;
using Markdig.Syntax;
using VectSharp.Markdown;
using Markdig.Extensions.Emoji;
using Markdig.Syntax.Inlines;
using System.Reflection;

namespace TestEmoji
{
    internal class Program
    {
        static void Main(string[] args)
        {
            string markdownSource = "# Horse :racehorse:\nNO :x:\nSad :sob:";

            MarkdownPipelineBuilder pipelineBuilder = new MarkdownPipelineBuilder()
        .UseAdvancedExtensions()
        .UseEmojiAndSmiley(
                new EmojiMapping(
                    // Transform all emojis of the form `:something:` into `emoji://something`
                            new Dictionary<string, string>(EmojiMapping.GetDefaultEmojiShortcodeToUnicode().Select(x => new KeyValuePair<string, string>(x.Key, "emoji://" + x.Key.Trim(':')))),
                            EmojiMapping.GetDefaultSmileyToEmojiShortcode())
                );

            MarkdownDocument markdownDocument = Markdig.Markdown.Parse(markdownSource, pipelineBuilder.Build());

            foreach (Block b in markdownDocument)
            {
                ProcessBlock(b);
            }

            MarkdownRenderer renderer = new MarkdownRenderer()
            {
                Margins = new Margins(0, 0, 0, 0),
            };

            // Backup the default image URI resolver.
            Func<string, string, (string, bool)> defaultImageUriResolver = renderer.ImageUriResolver;

            renderer.ImageUriResolver = (imageUri, baseUri) =>
            {
                if (imageUri.StartsWith("emoji://"))
                {
                    // Return a base64-encoded version of the emoji SVG.
                    return defaultImageUriResolver("data:image/svg+xml;base64," + RenderEmojiUri(imageUri, renderer), baseUri);
                }
                else
                {
                    // Process the image URI normally.
                    return defaultImageUriResolver(imageUri, baseUri);
                }
            };

            Page pag = renderer.RenderSinglePage(markdownDocument, 395, out Dictionary<string, string> linkDestinations);
            pag.SaveAsImage("markdown.png", OutputFormats.PNG, 2);
            pag.SaveAsSVG("markdown.svg");
        }

        static string RenderEmojiUri(string emojiUri, MarkdownRenderer renderer)
        {
            if (!emojiUri.StartsWith("emoji://", StringComparison.Ordinal))
            {
                throw new ArgumentException("The URI is not an emoji URI!", nameof(emojiUri));
            }
            else
            {
                // Heading level.
                int headingLevel = int.Parse(emojiUri.Substring(emojiUri.LastIndexOf("_heading:") + 9));

                // Name of the emoji.
                string emojiName = emojiUri.Substring(8, emojiUri.LastIndexOf("_heading:") - 8);

                // Get the emoji SVG.
                using Stream emojiStream = Assembly.GetExecutingAssembly().GetManifestResourceStream("TestEmoji.Emoji." + emojiName + ".svg");

                // Parse it into a VectSharp.Page.
                Page emojiPage = Parser.FromStream(emojiStream);

                // Target height of the emoji. This value should be approximately correct based on the font size of the document.
                double targetEmojiHeight = renderer.BaseFontSize * (headingLevel == 0 ? 1 : renderer.HeaderFontSizeMultipliers[headingLevel - 1]);

                // Create a scaled version. 
                Page scaledEmojiPage = new Page(emojiPage.Width * targetEmojiHeight / emojiPage.Height, targetEmojiHeight);
                scaledEmojiPage.Graphics.Scale(targetEmojiHeight / emojiPage.Height, targetEmojiHeight / emojiPage.Height);

                // Reasonably accurate position of the baseline (you may need to change this based on how much margin your SVG renderings have.
                double y = -renderer.RegularFontFamily.TrueTypeFile.Get1000EmDescent() / 1000 * emojiPage.Height * 0.5;
                scaledEmojiPage.Graphics.DrawGraphics(0, y, emojiPage.Graphics);

                // Return a base64-encoded version of the scaled emoji.
                using MemoryStream tempStream = new MemoryStream();
                scaledEmojiPage.SaveAsSVG(tempStream);
                return Convert.ToBase64String(tempStream.ToArray());
            }
        }

        // Recursively traverse the MarkdownDocument tree.
        static void ProcessBlock(Block b)
        {
            if (b is LeafBlock leaf)
            {
                if (leaf.Inline != null)
                {
                    foreach (Inline inline in leaf.Inline)
                    {
                        if (inline is EmojiInline emoji)
                        {
                            // Is the inline within a heading block? If so, we need to know this, because it affects the font size.
                            int headingLevel = 0;
                            Block currBlock = b;
                            while (currBlock != null)
                            {
                                if (currBlock is HeadingBlock heading)
                                {
                                    headingLevel = heading.Level;
                                    break;
                                }
                                currBlock = currBlock.Parent;
                            }

                            // Replace each EmojiInline with a LinkInline representing an image.
                            emoji.ReplaceBy(new LinkInline(emoji.Content.ToString() + "_heading:" + headingLevel.ToString(), "") { IsImage = true });
                        }
                    }
                }
            }
            else if (b is ContainerBlock container)
            {
                foreach (Block b2 in container)
                {
                    ProcessBlock(b2);
                }
            }
        }
    }
}
manutiedra commented 6 months ago

Wow. Thank you very much for the detailed description of the problem, a fully working solution and the quick reply!

Yes, I really needed this. I already have some graph animations working with VectSharp.Plots and I wanted to insert several frames with some information using VectSharp.Markdown that uses a lot of emojis. I'll have to investigate if I can tag the resulting Markdown to make transitions, but if I that works I will owe you a big one.

arklumpus commented 2 months ago

Hi, I have managed to implement this in VectSharp.Markdown v1.6.0. Emojis are now supported both as shortcode (e.g., :rainbow:), as well as by directly embedding the surrogate pair within the text (🌈). They are resolved by pulling them from https://openmoji.org, but this can be customised if you wish (see the emoji section here).

There is a certain amount of Black Magic™ involved with this, so please let me know if you spot any issues!

I'm closing this issue for now, but feel free to reopen it (or create a new one) if you have any problems.