AlessioLuciani / flutter-pdf-text

A plugin for Flutter that allows you to read the text content of PDF documents and convert it into strings.
MIT License
18 stars 45 forks source link

cant extract proper local language(like kannada,tamil) text from pdf ?? #7

Open rustiever opened 4 years ago

rustiever commented 4 years ago

i only tested in android. so the error might be from pdfBox

W/PdfBox-Android( 6641): No Unicode mapping for CID+222 (222) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+254 (254) in font TAUElangoArunthathi
I/chatty  ( 6641): uid=10281(com.example.text_audio) Thread-5 identical 2 lines
W/PdfBox-Android( 6641): No Unicode mapping for CID+254 (254) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+270 (270) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+270 (270) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+262 (262) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+262 (262) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+223 (223) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+223 (223) in font TAUElangoArunthathi

i think specifying the font while calling the method might solve. Just saying not sure

AlessioLuciani commented 4 years ago

i only tested in android. so the error might be from pdfBox

W/PdfBox-Android( 6641): No Unicode mapping for CID+222 (222) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+254 (254) in font TAUElangoArunthathi
I/chatty  ( 6641): uid=10281(com.example.text_audio) Thread-5 identical 2 lines
W/PdfBox-Android( 6641): No Unicode mapping for CID+254 (254) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+270 (270) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+270 (270) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+262 (262) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+262 (262) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+223 (223) in font TAUElangoArunthathi
W/PdfBox-Android( 6641): No Unicode mapping for CID+223 (223) in font TAUElangoArunthathi

i think specifying the font while calling the method might solve. Just saying not sure

Apparently there are some characters in the the font TAUElangoArunthathi that have no mapping for Unicode. So I guess that PdfBox can't turn them into plain text. Unfortunately I couldn't reproduce the error. I tried with a tamil pdf and PdfBox didn't complain. Maybe a similar error would present on iOS too.

rustiever commented 4 years ago

while parsing tamil pdf which font used by PdfBox??