Open Srimathi-Thirumoorthy opened 5 months ago
what kind of help is required here? I'm a Frasi native speaker that can help change code and verify the results, however I'm not really sure if I know what piece of code is to be changed here. I took a short look at the latest version and can't really spot the place where the drawing of an element with unicode text is happening.
FYI, I tracked it down to this method com.lowagie.text.pdf.BaseFont#convertToBytes(java.lang.String)
and it looks like the encoding is always set to Cp1252
from which I would not expect much to render any non-latin chars. maybe properly setting the charset on that (don't know how) will fix the issue. eventually using a font that has proper characters too.
@mohamnag Hi. Wow, thank you for debugging this problem with fonts.
Yes, now I see: FS always uses encoding winansi
(which I guess means Cp1252
). I don't know why, but it was used from the very beginning 01.02.2006 :)
I think we can change this encoding. Can you provide a simple example of such html and font, so we could add this example to FS tests?
well I went on and used a custom font where I can set the encoding. the result was unfortunately still problematic.
lets take this sample HTML:
<html lang="fa">
<head>
<meta charset="UTF-8"/>
<title>Title</title>
<style>
.rtl-font {
font-family: Vazirmatn;
direction: rtl;
}
</style>
</head>
<body>
<div style="background-color: blue">
تست فارسی
</div>
<div class="rtl-font" style="background-color: green">
تست فارسی
</div>
<div dir="rtl" style="background-color: red; font-family: Vazirmatn">
تست فارسی
</div>
</body>
</html>
I have the font (can get it for free from https://github.com/rastikerdar/vazirmatn/releases/tag/v33.003) unzipped into resources directory and this is my Java code:
try (OutputStream outputStream = new FileOutputStream("build/pdf/method4.pdf")) {
// parse and improve HTML
Document document = Jsoup.parse(new File(inputHtml.getFile()), "UTF-8");
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
var htmlString = document.html();
// initialize Flying Saucer
ITextRenderer renderer = new ITextRenderer();
SharedContext sharedContext = renderer.getSharedContext();
sharedContext.setPrint(true);
sharedContext.setInteractive(false);
renderer
.getFontResolver()
.addFont(
Main.class.getClassLoader().getResource("Vazirmatn/ttf/Vazirmatn-Regular.ttf").toString(),
BaseFont.IDENTITY_H,
true
);
renderer.setDocumentFromString(htmlString);
renderer.layout();
renderer.createPDF(outputStream);
// relative resources: see https://www.baeldung.com/java-html-to-pdf#dependencies-4
}
now this is the output that FS is giving me:
and this is what a browser gives me (ignoring the font not being applied):
there are two problems here:
ت
should be positioned right most but is left most.in general I would first go for solving this problem using a custom font (which for sure has all chars) and then maybe looking into fixing that charset for default font.
btw, you have probably seen this example of RTL rendering using OpenPDF but I just to mention it: https://github.com/LibrePDF/OpenPDF/blob/master/pdf-toolbox/src/test/java/com/lowagie/examples/fonts/styles/RightToLeft.java
I don't know if this is different than what FS is doing under the hood when working with OpenPDF but I couldn't find any of those methods being called.
I also found this post: https://groups.google.com/g/flying-saucer-users/c/n0CfuYfpQ6I/m/3iJIaZ4IAAAJ and a whole thread there that is related to this ticket.
Arabic and hebrew texts not supporting