google-code-export / purepdf

Automatically exported from code.google.com/p/purepdf
2 stars 1 forks source link

Can't justify unicode text in paragraph #37

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Register font:
Embed(source="assets/fonts/verdana.ttf", mimeType="application/octet-stream")]
        private var verdanaRegCls:Class;

FontsResourceFactory.getInstance().registerFont( VERDANA_REGULAR, new 
this.verdanaRegCls() );

2. Create font:
var fontRegular:BaseFont = BaseFont.createFont(VERDANA_REGULAR, 
BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

font = new Font(-1, size, -1, null, baseFont);

3. Add paragraph:
var parag:Paragraph = new Paragraph("Sample text that should be aligned 
justified!", font);
                parag.alignment = Element.ALIGN_JUSTIFIED;

document.add(parag);

What is the expected output? What do you see instead?

The text should be aligned justified but I got Error: NonImplementeationError

What version of the product are you using? On what operating system?

Op system is Win XP and PurePDF ver is 0.77.20110116

Please provide any additional information below.

Justification works correctly if I change font encoding from 
BaseFont.IDENTITY_H to BaseFont.CP1252.

But as I need to use Russian (cyrillic) characters I have to use encoding 
BaseFont.IDENTITY_H.

Original issue reported on code.google.com by aivar.p...@gmail.com on 29 Feb 2012 at 3:43

GoogleCodeExporter commented 9 years ago
Hello,

I have the same problem. Have you found any solution on this? This makes the 
library almost useless for eastern countries of Europe.

Original comment by vant...@gmail.com on 3 Apr 2012 at 6:59

GoogleCodeExporter commented 9 years ago
Hi,

I have found workaround for this problem. If i replaced regular space with 
nonbreaking space \u00a0, and set chunk.setSplitCharacter(new 
NonBreakingSplitharacter()) then justification worked correctly.

I'm using following script to change space characters in purePdf chunk:

var chunk:Chunk; //purePdf chunk
var text:String; //text to be added to pdf document
var tmpArray:Array;

tmpArray = text.split(" ");
text = tmpArray.join("\u00a0");

chunk = new Chunk(text, getFont());
chunk.setSplitCharacter(new NonBreakingSplitCharacter());

Original comment by aivar.p...@gmail.com on 3 Apr 2012 at 7:30

GoogleCodeExporter commented 9 years ago
Hello,

great thanks for your solution. I've updated it a little bit with new class, 
which extends DefaultSplitCharacter, and looks like this:

package pdfGenerator {

    import org.purepdf.ISplitCharacter;
    import org.purepdf.pdf.DefaultSplitCharacter;
    import org.purepdf.pdf.PdfChunk;

    /**
     * ...
     * @author Marcin Wantuch
     */
    public class MySplitCharacter extends DefaultSplitCharacter implements ISplitCharacter {

        public function MySplitCharacter() {
            super();
        }

        /**
         * if the current character is split character or not
         * @param   start - ???
         * @param   current - current position in the array
         * @param   cc - the character array that has to be checked
         * @param   ck - chunk array
         * @return  true, if this is split character
         */
        override public function isSplitCharacter( start: int, current: int, end: int, cc: Vector.<int>, ck: Vector.<PdfChunk> ): Boolean {

            var c: int = getCurrentCharacter( current, cc, ck );

            if ( c <= ' '.charCodeAt(0) || c == '-'.charCodeAt(0) || c == 8208 /*'\u2010'*/ || c == 160 /*'\u00a0'*/ ) {
                return true;
            }

            if( c < 0x2002 ) {
                return false;
            }

            return ( (c >= 0x2002 && c <= 0x200b)
                || (c >= 0x2e80 && c < 0xd7a0)
                || (c >= 0xf900 && c < 0xfb00)
                || (c >= 0xfe30 && c < 0xfe50)
                || (c >= 0xff61 && c < 0xffa0) );

        }

    }

}

and setting this class as default split character. The only change is to put "c 
== 160 /*'\u00a0'*/" at the end of if statement. It's a little bit better, 
because it doesn't begin the new line with comas, dots, quotes etc. (the 160 is 
a decimal equivalent to 00a0). So great thanks for your idea.

I have a next question, because let's have now pdf generated with unicode 
characters. Have you tried to copy text included in this pdf and viewed for 
example by Adobe Reader? When I copy it somewhere, I have different characters 
from this in the text, so I can't "Search" the pdf by the words. Is there any 
solution to fix it?

Original comment by vant...@gmail.com on 5 Apr 2012 at 6:25

GoogleCodeExporter commented 9 years ago
You can check it by clicking: http://89.234.211.20/_marcinw/test.pdf Just try 
to copy and paste selected text from pdf. How to fix it? The font base encoding 
is IDENTITY_H and must be like that.

Original comment by vant...@o2.pl on 18 May 2012 at 1:02