QuestPDF / QuestPDF

QuestPDF is a modern open-source .NET library for PDF document generation. Offering comprehensive layout engine powered by concise and discoverable C# Fluent API. Easily generate PDF reports, invoices, exports, etc.
https://www.questpdf.com
Other
12k stars 629 forks source link

Character(s) repeated after paragraph #382

Closed girlpunk closed 1 year ago

girlpunk commented 2 years ago

Describe the bug Still tracking down the exact cause and trying to get a simple repro (because something tells me you don't want this entire 600 file project), but it looks like 2022.6.0 introduced an edge case where 1-2 characters from the start of a span get repeated on a new line at the end of the paragraph.

To Reproduce TBC

Expected behavior Characters should not be repeated

Screenshots image

In this example, if the code formatting is removed from the preceeding .XLSX span (and this the entire paragraph rendered with a single Span() call, the extra fi at the end is replaced with the starting character of the entire paragraph.

Additional context I believe this may be related to the end of the span finishing very close (or at) the container width, as changing any characters in the span seems to affect the output.

MarcinZiabek commented 2 years ago

Did I say how much I hate this topic? I am still amazed about challanges regarding different languages, text shaping and RTL 😅

1) Did you tested with the latest 2022.9 version? 2) How often does the bug appear? (once per page, once per document, everywhere) 3) How many characters are repeated? (always 2?)

MarcinZiabek commented 2 years ago

Your screenshot has a poor resultion and I don't see clearly. It is also possible that we see only 1 character (glyph to be precise) repeated. The fi could be a ligature of f and i.

MarcinZiabek commented 2 years ago

I am trying to use the following code to find an issue.

var sentence = "Lorem ipsum dolor sit amet consectetuer.";

container
    .Padding(20)
    .Column(column =>
    {
        column.Spacing(10);

        foreach (var width in Enumerable.Range(25, 200))
        {
            column
                .Item()
                .MaxWidth(width)
                .Background(Colors.Grey.Lighten3)
                .Text(text =>
                {
                    text.Span("Before").FontColor(Colors.Green.Medium);
                    text.Span(sentence);
                    text.Span("After").FontColor(Colors.Red.Medium);
                });
        }
    });

I do see some issue with repeating characters between pages on version 2022.6.0. However, they appear to be fixed in 2022.6.3 and following.

girlpunk commented 2 years ago

Did I say how much I hate this topic

Sorry :p

Did you tested with the latest 2022.9 version?

Yes, no difference as far as 2022.9.1

How often does the bug appear? (once per page, once per document, everywhere)

I've only seen once it in one document, with at least 100 documents being run through the software since the June release. However, the same input does trigger the issue consistently. Changing the input slightly usually resolves the issue.

How many characters are repeated? (always 2?)

I've only seen it happen with one or two characters, but I think you're right that it's a single ligature/glyph that's actually being repeated. Unfortunately, I can't share the output as it's confidential, but I'll try to get something working to replicate it for you.

I do see some issue with repeating characters between pages on version 2022.6.0

I believe this is a different issue to #271 and #280. As you say that bug was specifically with page breaks whereas this occurs in the middle of a page. I haven't seen any evidence of that bug reoccurring since the 2022.6.3 release.

MarcinZiabek commented 2 years ago

Would be it possible to provide a screenshot with higher resolution? And with text blurred instead of erased? Understanding text structure may be very useful during investigation.

girlpunk commented 2 years ago

Is this any better?

image

MarcinZiabek commented 2 years ago

This is strange... to say the least 🤣 Are you sure that fi is present only in the visible section?

Is this a single Span? image

girlpunk commented 2 years ago

It starts just after, at files. The .XLSM is a separate span to achieve the different formatting.

fi isn't only present in that section, however when making the entire paragraph a single span the glyph at the start of that span (F) was duplicated in the same position.

MarcinZiabek commented 2 years ago

I also understand that the entire Span is rendered first (1). Then we have that text line with fi (2). And the next line contains next Span / content, right (3)?

image

girlpunk commented 2 years ago

The function calls look something like this: image

Column
    [1 - Other text calls with a single span each]
    Text
        2 - Span - starting 'F'
        3 - Span - '.XLSM'
        4 - Span - ' '
        5 - Span - starting 'files'
    [6 - Other text calls with a single span each]

Removing the special formatting from .XLSM (which results in the F from the start of span 2 being duplicated at the bottom) removes span calls 3, 4, and 5, and puts all of the text into the single span call.

MarcinZiabek commented 2 years ago

I have literally no idea how this may happen with QuestPDF code. That it renders again the first glyph, and only first glyph.

However, with the 2022.6 release, I have introduced text shaping with SkiaSharp.HarfBuzz. What if this is a bug is somehow related? Some corner case? Would you like to upgrade both SkiaSharp and HarfBuzz to latest version and test? Chances are low but...

girlpunk commented 2 years ago

Unfortunately I've had the latest version of SkiaSharp (2.88.3) installed for all the tests. Tried explicitly specifying SkiaSharp.HarfBuzz 2.88.3 to match, but no difference there.

girlpunk commented 1 year ago

For reference, this still occurs in 2022.11.0

allac00 commented 1 year ago

I am facing the same thing in the letters that i generate. The letter has 3 columns, the column in the middle is the column where the text is added. When the text is longer and 'hitting' the end of the column the first character of the last word is repeated on top of the first character on the next line.

image

In the example the 1: is a combination of a v (the letter that it should be) overwritten with a 'd' (the first letter of the last wordt in the line above). 2: is a combination of a w (the letter that it should be) overwritten with a 'b' (the first letter of the last wordt in the line above).

I'm using the last version of QuestPDF version 2022.12.0. When i downgrade to version 2022.05 the problem is gone without changing anything in the code.

MarcinZiabek commented 1 year ago

@allac00 This is a terrible corner case, quite difficult to track down. Based on your screenshot, I assume that sharing the code of your document is not an option?

allac00 commented 1 year ago

@MarcinZiabek i might be able to share (most of) the code but there will be no content in it because all the content is kept inside the database (so that the users can change the content themselves), so i dont think that there is a solution to find in it since the problem is a corner case depending on the content.

girlpunk commented 1 year ago

Is there a way to generate a location trace for an arbitrary location in a document? If so, would that be useful for troubleshooting this?

MarcinZiabek commented 1 year ago

@allac00 @girlpunk I would like to ask you for a favour. Would it be possible to track down an exact (both major and minor) QuestPDF version where the problem has been introduced? Then, I will do my best to compare code.

I am afraid that this problem was introduce during the transititon to the HarfBuzzSharp library (for text shaping). And there are tons of changes.

AntonyCorbett commented 1 year ago

@MarcinZiabek I can reproduce this behaviour with 2022.12.1 and will investigate further. The second column displays the overlaid characters at the start of each span.

    static void Main(string[] args)
    {
        var sentence = "Lorem ipsum dolor sit amet consectetuer. ";

        Document.Create(container =>
        {
            container.Page(page =>
            {
                page.DefaultTextStyle(style => style.FontSize(9));
                page.Content()
                    .Column(col =>
                    {
                        col.Item().Row(row =>
                        {
                            row.Spacing(10);

                            for (var relativeWidth = 1; relativeWidth <= 8; ++relativeWidth)
                            {
                                row.RelativeItem(relativeWidth)
                                    .Padding(1)
                                    .Background(Colors.Grey.Lighten3)
                                    .Text(text =>
                                    {
                                        text.Span("Before ").FontColor(Colors.Green.Medium);
                                        text.Span(sentence);
                                        text.Span(sentence).FontColor(Colors.Blue.Darken1);
                                        text.Span(sentence);
                                        text.Span(" After").FontColor(Colors.Red.Medium);
                                    });
                            }
                        });

                    });
            });
        }).GeneratePdf("hello.pdf");

        Process.Start("explorer.exe", "hello.pdf");
    }

2023-01-26_18-07-07

AntonyCorbett commented 1 year ago

In my experimental case, the issue appears to be caused when a line ends with a space character and there is no room for that space on the line. So, in my code, if you modify the "Before" span so that there is no space after the word "Before", all is well.

MarcinZiabek commented 1 year ago

@AntonyCorbett This is great! Having a working minimal example that shows the issue may help with fixing it. If you learn anything else about the problem, please let me know. Thank you 😁

girlpunk commented 1 year ago

Thanks for the help debugging this one Antony, unfortunately my mail use of QuestPDF is for work and other projects have been taking priority at the moment, so hadn't had time to look into this.