Open Hert79 opened 1 year ago
@Hert79 are you able to share the source PDF at all so I can check what differences the file encodes and think about what an API for that information might look like please?
Unfortunately I encountered this in a copyrighted pdf so cannot share it here. If I encounter it again in a pdf I can share I will.
I will just describe the problem I'm facing rather then trying to come up with solutions myself.
I need to merge or split blocks of text whether they belong together or do not belong together. Eg authors and title on the first page of a book part in a miscellany. Usually I can do this using the font (it usually changes), but sometimes that doesn't work:
the title and author are clearly in a different style when seen by the human eye. But with the info PdfPig gives me they seem exactly the same style. They are both in the same font in the same size. The l's in both lines are encoded to the same character: "Latin Small Letter l", U+006C. But the title uses the regular glyphs while the author uses smallcaps glyphs. In the font these have different CID's, but I cannot access that from the Letter class. Would it be possible to add something to the letter class so I can check whether two Letter objects in the same font with the same value actually use the same glyph or not?