Extract non ascii/unicode text from PDF

Hey! I'm trying to extract text from this file using tikaondotnet.extraction. the code is really basic public static string Extract(string path) { var te = new TextExtractor(); return te.Extract(path).Text; }

When I get to the arabic text part in the attached pdf, I get a lot of warnings like the following - WARN No Unicode mapping for behini (112) in font NSIEBX+OmegaSerifArabicOne WARN No Unicode mapping for seenmed (148) in font NSIEBX+OmegaSerifArabicOne WARN No Unicode mapping for meemfin (205) in font NSIEBX+OmegaSerifArabicOne WARN No Unicode mapping for alifiso (109) in font NSIEBX+OmegaSerifArabicOne WARN No Unicode mapping for lamini (191) in font NSIEBX+OmegaSerifArabicOne

This is the extracted text

I was wondering if there's an option to add a decode specification when extracting the text\ an option to convert all the the text to a different font that is supported in tika?

P.S. the English text is extracted fine :)

KevM / tikaondotnet

Extract non ascii/unicode text from PDF #148