arklumpus / VectSharp

A light library for C# vector graphics
GNU Lesser General Public License v3.0
226 stars 22 forks source link

How to draw a HTML table and export as a PNG file? #9

Closed zydjohnHotmail closed 3 years ago

zydjohnHotmail commented 3 years ago

Hello:

I now have a new request that I want to draw a simple HTML table, and fill in with some data from C#, the data is actually retrieved from SQL Server data base. I want to draw a simple HTML table, then fill in the table with data, then save the HTML table as a PNG file. All the data should be left aligned, not in the center of the table cell. The following HTML table is the format. The table column names are fixed, the column size is also fixed. The data inside the HTML table will vary.

<!DOCTYPE html>

A basic HTML table

Company Contact Country
Alfreds Futterkiste Maria Anders Germany
Centro comercial Moctezuma Francisco Chang Mexico

To undestand the example better, we have added borders to the table.

Any idea how I should begin? Let’s say I have two rows of data just like in the HTML table. Thanks, PS: Hope you have settled in your new house!

arklumpus commented 3 years ago

Hi! This is a bit more complicated, because you need to:

1) Parse the HTML table 2) Measure the size of each cell 3) Arrange everything correctly

Here is what I came up with; I'm using things from the System.Xml.Linq namespace to parse the HTML, but you should see whether this works in your case or you need an actual HTML parser.

using VectSharp;
using VectSharp.Raster;
using System;
using System.IO;
using System.Xml.Linq;
using System.Collections.Generic;
using System.Linq;

namespace HTMLTable
{
    class Program
    {
        static void Main()
        {
            // Read the HTML table from somewhere
            string html = ...

            // Parse the HTML table
            List<List<(string Contents, bool IsHeader)>> table = ParseHTMLTable(html);

            // Define fonts
            Font titleFont = new Font(new FontFamily(FontFamily.StandardFontFamilies.HelveticaBold), 16);
            Font regularFont = new Font(new FontFamily(FontFamily.StandardFontFamilies.Helvetica), 14);

            // The available width that will be filled by the table.
            double pageWidth = 700;

            // Margins for the page.
            double pageMarginTop = 10;
            double pageMarginBottom = 10;
            double pageMarginLeft = 10;
            double pageMarginRight = 10;

            // We will fix the height later.
            Page pag = new Page(pageWidth, 0);

            // The available width that will be filled by the table.
            double availableWidth = pageWidth - pageMarginLeft - pageMarginRight;

            // Compute column widths. If you have fixed values, replace the call to GetColumnWidths with those values.
            double[] columnWidths = GetColumnWidths(table, availableWidth, titleFont, regularFont);

            // Total height of the table, we will update this as we draw it.
            double tableHeight = 0;

            // Apply the margins.
            pag.Graphics.Translate(pageMarginLeft, pageMarginTop);

            // Save the current translation.
            pag.Graphics.Save();

            // We can now render the table, row by row.
            for (int i = 0; i < table.Count; i++)
            {
                // This method renders the row on the Graphics object and returns the height of the rendered row.
                // We draw the bottom border for all rows except the last one.
                double rowHeight = RenderRow(table[i], columnWidths, titleFont, regularFont, pag.Graphics, i < table.Count - 1);

                // Increment the total height of the page.
                tableHeight += rowHeight;

                // Translate the graphics so that the origin is now at the bottom of the row that we have just rendered.
                pag.Graphics.Translate(0, rowHeight);
            }

            // Restore the translation, so that (0, 0) is now the top-left corner of the table.
            pag.Graphics.Restore();

            // Draw the column border on the right for all columns except the last one.
            double currX = 0;

            for (int i = 0; i < columnWidths.Length - 1; i++)
            {
                pag.Graphics.StrokePath(new GraphicsPath().MoveTo(currX + columnWidths[i], 0).LineTo(currX + columnWidths[i], tableHeight), Colours.Black);

                currX += columnWidths[i];
            }

            // Draw the table border
            pag.Graphics.StrokeRectangle(0, 0, availableWidth, tableHeight, Colours.Black);

            // Update the height of the page.
            pag.Height = tableHeight + pageMarginTop + pageMarginBottom;

            // Save the page containing the table as a PNG image.
            pag.SaveAsPNG("Table.png");
        }

        // This method parses an HTML table, assuming that the HTML is well-formed and can be parsed as an XML document.
        static List<List<(string Contents, bool IsHeader)>> ParseHTMLTable(string html)
        {
            // Parse the HTML table as an XML document
            XDocument doc = XDocument.Parse(html);

            // Get the table element
            XElement table = doc.Element(XName.Get("table"));

            // Select all the rows in the table (which are identified by a <tr> tag)
            return (from row in table.Elements(XName.Get("tr"))
                        // Then, from each row select all the cells...
                    select (from cell in row.Elements()
                                // ... which are identified by a <td> or <th> tag
                            where cell.Name == XName.Get("td") || cell.Name == XName.Get("th")
                            // For each cell, return the text contents of the cell, as well as a boolean value indicating whether the
                            // cell is a header cell (i.e. it was identified by a <th> tag)
                            select (String.Concat(from el in cell.Nodes() select el.ToString(SaveOptions.DisableFormatting)), cell.Name == XName.Get("th"))).ToList()).ToList();
        }

        // This method computes the column widths based on the amount of text in each column and the available width.
        static double[] GetColumnWidths(List<List<(string Contents, bool IsHeader)>> table, double availableWidth, Font titleFont, Font regularFont)
        {
            // Assume that all the rows have the same number of columns.
            double[] columnWidths = new double[table[0].Count];

            foreach (List<(string Contents, bool IsHeader)> row in table)
            {
                for (int i = 0; i < row.Count; i++)
                {
                    double width;

                    // Measure the width of the text contained in the cell.
                    if (row[i].IsHeader)
                    {
                        width = titleFont.MeasureText(row[i].Contents).Width;
                    }
                    else
                    {
                        width = regularFont.MeasureText(row[i].Contents).Width;
                    }

                    // Update the maximum column width.
                    columnWidths[i] = Math.Max(columnWidths[i], width);
                }
            }

            // Total "desired" width of the table.
            double totalWidth = columnWidths.Sum();

            // Assign to each column a width that is proportional to the maximum width of the text contained in the column.
            for (int i = 0; i < columnWidths.Length; i++)
            {
                columnWidths[i] = columnWidths[i] / totalWidth * availableWidth;
            }

            return columnWidths;
        }

        static double RenderRow(List<(string Contents, bool IsHeader)> row, double[] columnWidths, Font titleFont, Font regularFont, Graphics graphics, bool drawBottomBorder)
        {
            // We proceed in multiple steps:
            //  * First we draw the text contained in each cell on a temporary buffer, adding line breaks where necessary, and measuring the height of the cell
            //  * Then, we draw the row border
            //  * Finally, we align each cell vertically and transfer the contents of the cell onto the main graphics object.

            // Margins for each cell.
            double cellMarginLeft = 5;
            double cellMarginRight = 5;
            double cellMarginTop = 5;
            double cellMarginBottom = 5;

            // List of temporary Graphics objects and cell heights.
            List<(Graphics Buffer, double Height)> cellGraphics = new List<(Graphics, double)>();

            double rowHeight = 0;

            for (int i = 0; i < row.Count; i++)
            {
                Graphics cellBuffer = new Graphics();

                // The width that is available for the cell corresponds to the column width minus the margins.
                double cellWidth = columnWidths[i] - cellMarginLeft - cellMarginRight;

                // Render the cell to the cell buffer. This method returns the height of the rendered cell.
                double height = RenderCell(row[i].Contents, cellWidth, row[i].IsHeader ? titleFont : regularFont, cellBuffer);

                // Update the row height.
                rowHeight = Math.Max(rowHeight, height);

                // Store the rendered cell for later.
                cellGraphics.Add((cellBuffer, height));
            }

            // The total height of the row corresponds to the cell content size plus the margins.
            double totalRowHeight = rowHeight + cellMarginTop + cellMarginBottom;

            // Draw the bottom border of the row.
            if (drawBottomBorder)
            {
                graphics.StrokePath(new GraphicsPath().MoveTo(0, totalRowHeight).LineTo(columnWidths.Sum(), totalRowHeight), Colours.Black);
            }

            // Save the current translation.
            graphics.Save();

            // Apply the horizontal margin.
            graphics.Translate(cellMarginLeft, 0);

            // We can now transfer each cell on the main graphics object.
            for (int i = 0; i < cellGraphics.Count; i++)
            {
                // Choose one of the three following options:

                // Align the cells at the top.
                //double cellY = cellMarginTop;

                // Center the cells vertically.
                double cellY = cellMarginTop + (rowHeight - cellGraphics[i].Height) * 0.5;

                // Align the cells at the bottom.
                //double cellY = cellMarginTop + rowHeight - cellGraphics[i].Height;

                // Save the current translation.
                graphics.Save();

                // Transfer the contents of the cell.
                graphics.DrawGraphics(0, cellY, cellGraphics[i].Buffer);

                // Restore the translation.
                graphics.Restore();

                // Translate to the next column.
                graphics.Translate(columnWidths[i], 0);
            }

            // Restore the translation.
            graphics.Restore();

            return totalRowHeight;
        }

        // Render the text in a single cell, with the specified maximum width. Returns the height of the cell.
        // This is essentially the same code as in https://github.com/arklumpus/VectSharp/issues/3#issuecomment-896885086
        static double RenderCell(string cellText, double cellWidth, Font fnt, Graphics graphics)
        {
            // Split the text in paragraphs (i.e. at every line break)
            string[] documentParagraphs = cellText.Split(Environment.NewLine);

            // Split each paragraph into separate words
            string[][] documentWords = new string[documentParagraphs.Length][];

            for (int i = 0; i < documentParagraphs.Length; i++)
            {
                // Split at spaces. You could use a Regex matching \b for something fancier, but this should be enough.
                documentWords[i] = documentParagraphs[i].Split(' ');
            }

            // Current y coordinate, we will increment this as we draw the text lines
            double currY = fnt.Ascent;

            // Render each paragraph separately 
            for (int i = 0; i < documentWords.Length; i++)
            {
                System.Text.StringBuilder currentLine = new System.Text.StringBuilder();
                int wordIndex = 0;

                while (wordIndex < documentWords[i].Length)
                {
                    // Current text of the line buffer
                    string currentLineText = currentLine.ToString();

                    // Test string: let's see whether this fits in the page or not
                    string testText;

                    if (currentLine.Length > 0)
                    {
                        testText = currentLineText + " " + documentWords[i][wordIndex];
                    }
                    else
                    {
                        testText = documentWords[i][wordIndex];
                    }

                    // Measure the test string
                    Size testTextSize = fnt.MeasureText(testText);

                    if (testTextSize.Width < cellWidth || currentLine.Length == 0)
                    {
                        // We can append the current word and not go over the page width limit: append the current word and go on.
                        if (currentLine.Length > 0)
                        {
                            currentLine.Append(" ");
                        }

                        currentLine.Append(documentWords[i][wordIndex]);
                    }
                    else
                    {
                        // Appending the current word causes us to go over the page width limit: render the current line (if it's not empty) and go on.
                        if (currentLine.Length > 0)
                        {
                            // Measure the current line width
                            Size currentLineSize = fnt.MeasureText(currentLineText);

                            // Choose one of the three following options (alternatively, you could add another parameter to this method to choose between the three, e.g. centering the headers and aligning the other cells to the left).

                            // Draw the text line centering it horizontally
                            // graphics.FillText((cellWidth - currentLineSize.Width) * 0.5, currY, currentLineText, fnt, Colours.Black, TextBaselines.Baseline);

                            // Draw the text line aligning it on the left
                            graphics.FillText(0, currY, currentLineText, fnt, Colours.Black, TextBaselines.Baseline);

                            // Draw the text line aligning it on the right
                            // graphics.FillText(cellWidth - currentLineSize.Width, currY, currentLineText, fnt, Colours.Black, TextBaselines.Baseline);

                            // Increase the current y coordinate
                            currY += fnt.FontSize * 1.4;
                        }

                        currentLine.Clear();
                        currentLine.Append(documentWords[i][wordIndex]);
                    }

                    wordIndex++;
                }

                // We have reached the end of the paragraph, but the last line of the paragraph probably still needs to be rendered
                if (currentLine.Length > 0)
                {
                    // Current text of the line buffer
                    string currentLineText = currentLine.ToString();

                    // Measure the current line width
                    Size currentLineSize = fnt.MeasureText(currentLineText);

                    // Choose one of the three following options (alternatively, you could add another parameter to this method to choose between the three, e.g. centering the headers and aligning the other cells to the left).

                    // Draw the text line centering it horizontally
                    // graphics.FillText((cellWidth - currentLineSize.Width) * 0.5, currY, currentLineText, fnt, Colours.Black, TextBaselines.Baseline);

                    // Draw the text line aligning it on the left
                    graphics.FillText(0, currY, currentLineText, fnt, Colours.Black, TextBaselines.Baseline);

                    // Draw the text line aligning it on the right
                    // graphics.FillText(cellWidth - currentLineSize.Width, currY, currentLineText, fnt, Colours.Black, TextBaselines.Baseline);
                }

                // Optional fancy bit: add some paragraph spacing
                if (i < documentParagraphs.Length - 1)
                {
                    currY += fnt.FontSize * 0.7;
                }
                else
                {
                    currY -= fnt.Descent;
                }
            }

            // Return the height of the cell
            return currY;
        }

    }
}

The output with your table is this:

The good thing is that this also works with longer text that needs to span multiple lines, e.g.:

<table style="width:100%">
  <tr>
    <th>Lorem ipsum dolor sit amet, consectetur adipiscing elit. In dictum efficitur lacinia. Ut elit nulla, ornare eu est sit amet, sodales gravida turpis. Nulla facilisi.</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Donec euismod, urna et venenatis molestie, velit magna luctus risus, vel mattis sapien dui ac elit. Quisque eget euismod dolor, sit amet varius neque.</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td>Centro comercial Moctezuma</td>
    <td>Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Curabitur et dictum massa. Aliquam blandit varius erat, id fermentum ex ultrices vehicula. Nunc eget dignissim dolor, eu laoreet magna. Mauris aliquam ex ac tellus interdum, in pretium diam placerat.</td>
    <td>Nam sit amet mauris vitae mi egestas sollicitudin sit amet eu odio. Pellentesque vel lectus vestibulum, aliquet leo a, tincidunt felis.</td>
  </tr>
</table>

There are also comments in the code that tell you how to change the way the text is centered horizontally and vertically in each cell.

I hope this makes sense!

zydjohnHotmail commented 3 years ago

Hello: thanks for your code, I have to spend some time to learn it. But also get the data from SQL Server database and make the data conform the the form will need some time. I will let you know if I have done the testing.

zydjohnHotmail commented 3 years ago

Hello: I have tested your code, it works. Thank you very much. But I have one concern: To show the HTML table, I have a fixed size of page, with width of 700px and height of 400px. But I can have from one row of data up until maximum 20 rows of data. Your code shows 2 rows of data. So I want to keep the HTML table in the center of the fixed size of page (700px by 400px). I think I can keep the following variable constant: double pageMarginLeft = 100; double pageMarginRight = 100; But I have to make those variable changeable according to how many rows of data I have. If I have only one row of data, then the Margin for PageMarginTop and PageMarginBottom will be very big. But if I have 20 of such rows of data, then the both margins will be rather small. How I can change the code to make the both top and bottom margins changeable? Thanks,

arklumpus commented 3 years ago

Hi, this should be relatively easy, using the same "container" approach as we did before. Basically, after you draw the table on the Page pag, instead of directly saving pag as a PNG image, you create another Page with the required height, and draw the contents of pag on it, centering it vertically.

So, basically:

Replace pag.SaveAsPNG("Table.png"); with the following:

    // Create the container page
    Page containerPage = new Page(700, 400);

    // Draw the page with the table on the container
    containerPage.Graphics.DrawGraphics(0, (400 - pag.Height) * 0.5, pag.Graphics);

    // Save the container
    containerPage.SaveAsPNG("Table.png");

For example (with red background to highlight the size of the page):

Naturally, you need to find the right font size if you wish to display 20 rows of data.

zydjohnHotmail commented 3 years ago

Hello: I found another minor issue, for none-English, the Diacritics is gone. As your code becomes complicated, I can't find where to change and keep all those diacritics for None-English, like in German: die Bücher, the "ü" Thanks,

zydjohnHotmail commented 3 years ago

Hello: Thanks for your code. I want to know if I can put one picture as background. For example, in stead of using red color as background, can I use an image, like a soccer image as the background? Thanks,

arklumpus commented 3 years ago

Hi,

Sorry for the delay. To solve the issue with the diacritics, there are two lines where you need to change from:

graphics.FillText(0, currY, currentLineText, fnt, Colours.Black, TextBaselines.Baseline);

To:

graphics.FillPath(new GraphicsPath().AddText(0, currY, currentLineText, fnt, TextBaselines.Baseline), Colours.Black);

You can find them easily in the code by searching for FillText.

To add a raster image to the plot, you can use VectSharp.MuPDFUtils to load it and then draw it using the Graphics.DrawRasterImage method. First of all, make sure that the VectSharp.MuPDFUtils NuGet package is installed. Then, you can use this code to draw the background image and the table over it:

    // Create the container page
    Page containerPage = new Page(700, 400) { Background = Colours.Red };

    // Open the image
    VectSharp.MuPDFUtils.RasterImageFile image = new VectSharp.MuPDFUtils.RasterImageFile(@"path/to/image_file");

    // Draw the image on the page
    containerPage.Graphics.DrawRasterImage(0, 0, 700, 400, image);

    // Draw the page with the table on the container
    containerPage.Graphics.DrawGraphics(0, (400 - pag.Height) * 0.5, pag.Graphics);

    // Save the container
    containerPage.SaveAsPNG("Table.png");

Make sure that the image has the right aspect ratio to be drawn on a 700x400 page. For example, using this image as the background, this is the result:

zydjohnHotmail commented 3 years ago

Hello: Thank you very much, your code works! Many thanks!