J-F-Liu / lopdf

A Rust library for PDF document manipulation.
MIT License
1.63k stars 176 forks source link

Unexpected behavior using vowels with acute accent and Arabic text #282

Open waiylkarim opened 3 months ago

waiylkarim commented 3 months ago

Hello,

I'm trying to display french and arabic text in a pdf using Lopdf but the accents in the french text is displayed in random symbols and all letter in the arabic text are displayed in rectangles.

Sample of my code:


    let text = "Café";
    operations.push(Operation::new("BT", vec![]));
    operations.push(Operation::new("Tf", vec!["F1".into(), 13.into()]));
    operations.push(Operation::new(
        "Td",
        vec![A4_PADDING.into(), (A4_HEIGHT - (LOGO_HEIGHT + 35.)).into()],
    ));

    operations.push(Operation::new("Tj", vec![Object::string_literal(text)]));
    operations.push(Operation::new("ET", vec![]));

Using:

lopdf = "0.32.0" MacOS: v12.7.3 rustc 1.79.0 (129f3b996 2024-06-10)

Any idea how to fix this? Thank you so much!

Heinenen commented 2 months ago

When rendering text, the parameter to Tj is normally not text, but some indices that are only valid for this PDF file. The indices are then looked up in the cmap of the font (which you would also have to create). The cmap then maps character IDs (CIDs) to glyph IDs (GIDs) in the font.

Some explanation on the difference between character and glyph from the PDF spec:

A character is an abstract symbol, whereas a glyph is a specific graphical rendering of a character.

lopdf does not offer an interface at the moment that would make this process easier.