Closed OddArneSaetervik closed 1 year ago
Et par verktøy man kan ta med i vurderingen: https://github.com/dompdf/dompdf # HTML to PDF converter https://wkhtmltopdf.org/ # command line tools to render HTML into PDF
Altinn 2 bruker Essential Objects sin pdf-løsning. https://www.essentialobjects.com/#Pdf
Jeg kan nevne at jeg har hatt gode erfaringer tidligere med wkhtmltopdf
. Det er prinsipielt et ganske enkelt verktøy, og det blir omtrent det samme som å printe en nettside fra Chrome. Det gir i alle fall mye fleksibilitet uten at man trenger å lære seg et eget bibliotek spesifikt for å lage PDFer.
SelectPDF er Windows-only, og bruker System.Drawing-API-ene. Disse er ikke tilgjengelig på Windows Server Core eller Windows Nano (https://docs.microsoft.com/en-us/dotnet/api/system.drawing?view=net-6.0) så denne tror jeg ikke er aktuell.
Er muligens andre av de .NET-baserte på lista som benytter System.Drawing, som da i praksis betyr windows-only. Bruk av libgdiplus som workaround på Linux fjernes fra .NET7. (SelectPDF har i tillegg bindinger til kernel32.dll)
Forøvrig verdt å merke seg det som nevnes på https://wkhtmltopdf.org/status.html, nemlig at man ikke kan bruke dette på "untrusted html". Dette gjelder nok i en eller annen grad for alle bibliotekene som tar html/css/js som input. Hvis dette er tenkt som en sentral service, vil kreve kontroll på hvorvidt appene kan direkte påvirke hvordan html-en blir generert - om de skal ha det i det hele tatt. Dette må da balanseres opp mot appenes behov for fleksibilitet i hvordan pdf-ene blir generert.
Har testet litt med IronPDF. Denne bruker også som default Chromium behind the scenes. I utgangspunktet henter den ned Chrome og andre dependencies runtime(!), som ble en smule knot å få til med permissions etc i Docker. Men man kan hente inn en spesialversjon av nuget-en, IronPdf.Linux
samt den rendreren en går for, f.eks. IronPdf.Native.Chrome.Linux
, som da sørger for å hente alt ved byggetid. Dette fører til at det legges inn en runtimes
-mappe i output, som inneholder binaries herunder Chrome. /app/runtimes/linux-x64/native
er 411MB på imaget.
Ting jeg noterte meg:
-focal
-taggen på imagene som hentes, som er Ubuntu 20.IrontPdf.Installation.LinuxAndDockerDependenciesAutoConfig=false;
RUN chmod 755 runtimes/linux-x64/native/IronCefSubprocess
Ellers nyttig info:
Foreløpig IronPdf og iText som kanskje peker seg ut så langt. Veldig bra at du tester ut IronPdf.
Quote HansO
HTML to PDF på server-side er et lite monster. Er skeptisk til at en app-utvikler plutselig kan ta over platform-klusteret ved en sårbarhet i pdf-generatoren. Sååå mye trust har vi ikke i dem
My thoughts...
When new UI components, widgets, page layouts, process steps, dynamics, translations or other new app features are introduced, PDFs should be generated without requiring additional effort.
How to achieve this:
When creating an app with a nice responsive user interface, the generated PDF should automatically reflect this interface.
How to achieve this:
The generated PDFs must be accessible and follow WCAG and the requirements for PDF/UA
How to achieve this:
Generating PDFs can be heavy, typically consuming a lot of compute and memory. When load increases, PDF generation needs to scale automatically, and scaling should only impact the relevant app owner.
How to achieve this:
Rendering HTML and running javascript server side in a browser is a security challenge, since we can't trust the HTML. This challenge needs to be handled.
How to achieve this:
Altinn 3 is following a set of architecture principles, PDF generation should do that as well.
Relavant Altinn 3 principles:
@altinnadmin Great writeup!
We considered the performance and stability issues when solving the same issue at UDI, and our architect used ServiceBus to ensure that we could do a series of async opreations on different azure functions with proper retry, and also the possibility of saying OK to the user and process the pdf generation backlog at a later time.
I believe that some orgs would want to create their PDF view not based on an automatic merging of all pages. There are probably lots of services that want a different "custom view". Should this be covered?
There are probably lots of services that want a different "custom view". Should this be covered?
Yes, if that's a need we could add config, so that for example the Summary page could be used instead, or perhaps in addition. Remember that the user probably has a need to be able to see everything regardless of the need of the app owner.
Ref. bullet 4 above: "Other pages like Summary, Receipt, Payment, etc. can also be convertable to PDF in the same way, all we need is an URL"
Yes, if that's a need we could add config
Or a reasonable extention point where app devs can add their own logic for generating artefacts at appropriate times. Config can come later when we want to make it possible to do in Altinn Studio.
@SandGrainOne , @altinnadmin
I know of some good alternatives that offer sandboxing through either containerization, file-based ingress/egress or use through APIs. The famous and extensive pandoc
framework use xelatex
(i.e XeTeX
) - but through LaTeX-format and files (UNIX philosophy). But interestingly pandoc
also use prince
, which is free to non-commercial - while governmental use has a fee. They are also governed under Norwegian license law. They offer supreme PDF/UA capabilities as well, and file-based ingress/egress.
Thanks @ddrune . I had a look at the wikipedia page for Prince:
CSS Grid Layout (css-grid-1) is not yet present in Prince 14.
and
Prince supports most of ECMAScript 5th edition, but not strict mode. Later editions of ECMAScript are largely not supported.
I don't think prince is able to fulfill our first principle, PDF generation should "just work", given that our apps are modern React-applications. A lot has happened since the 5th edition of ecmascript was released way back in 2009.
We eventuelly landed on using latest version browserless/chrome from browserless.io. This will be hosted together with the application owner apps in the app owner spesific AKS. I suspect that we would want to eventually replace it with something where we have more direct control, but for now this issue is being closed as completed.
Selvfølgelig vil dette være avhenging av hvordan HTML ene blir seende ut. Altså hvor rike de kan komme til å være. Dette antar vi vil komme til å bli avklart på #13
Identifiserte kandidater er:
Avslutningsvis må man bestemme seg for hvilket alternativ man skal kjøre PoC på
Analysis
Licensing
Open source always come with a license. We must consider each one if we do decide on an open source product. Open source products can sometimes also have bought versions we could consider.
Rendering
There are primarily two type of products. The product has its own HTML renderer. The other kind of product use an integrated (or external) browser to perform the rendering and a "save as PDF" type of mechanism. This last type of product will probably always render something closer to what the user would see in a browser.
Input types
The tools will either take an HTML document or a URL, or both as input.
PDF/A AND UA
For tools that don't support PDF/A, it might be possible to use a third tool like ghostscript. Another important standard is UA (universal accessibility). More info... Even more info...
Vekt
Output must be PDF-A3/UA 60% implementasjon effort 20% cost 20%
Candidates
Discarded candidates