PhilterPaper / PDF-Table

Official repository for PDF::Table in Perl
https://www.catskilltech.com/FreeSW/product/PDF%2DTable/title/PDF%3A%3ATable/freeSW_full
Other
10 stars 15 forks source link

bug?: strange inserting of a space after a period (full stop) and alphanumeric character (kerning?) #50

Closed westmj closed 4 years ago

westmj commented 4 years ago

Sentences render as expected in PDF::API2 $content->text(); but in PDF::Table strange things happen when a "word" with an embedded period is found, such as "name@gmail.com"... namely, a space appears, after (the period, and at least one more character), e.g. as "name@gmail.c om". Demonstration program and sample output attached.

bug01.pl.txt bug01.pdf

By the way, can PDF::Table be used in PDF::Builder, a fork of PDF::API2? As is? Or with minor editting? Thanks.

PhilterPaper commented 4 years ago

It does the same thing (extra space) with PDF::Builder. So far, all I did was change all occurrences of "API2" in Table.pm (the only file?) to "Builder". And of course, the same in your bug01.pl code.

I see an extra space after . letter or digit, and two spaces after .. letter or digit. And sometimes a space right after . . I'll take a look at it. I'm wondering if there are some "dot codes" (special meaning for a period) in Table, or perhaps it's trying to enforce some punctuation spacing rules.

PhilterPaper commented 4 years ago

It turned out to be quite simple. Any word longer than 20 characters gets split in two. Add

max_word_length => 50,

(more or less) to the table() parameter list. Splitting will be suppressed for shorter words. The email address you used just happened to have 20 characters right at the period.

The Table.pm code could be made more general, to accept PDF::Builder input. There are about 5 places where data types are checked against "PDF::API2..." strings. These could be changed to check against "PDF::API2..." or "PDF::Builder...".

westmj commented 4 years ago

Thanks! That extinguishes the bug. I will take a look at modifying PDF::Table. Have any pseudo-code to suggest? Here is a crude modification of the bug demo program that shows "max_word_length => 60," does the trick.

bug02.pl.txt

kamenov commented 4 years ago

If you'd like to propose a pull request I'd be happy to merge and release it on CPAN.

PhilterPaper commented 4 years ago

I think you should reconsider splitting words at max_word_length, and only split when necessary to fit in a table cell. In addition, a hyphen (dash) should be added at the split. Where a better place to split a word would be is language-dependent, and some languages have some rules about repeating letters at the split. Word splitting due to line space is a very complicated area.