text_control=TextControl("physical",insert_bom=True)
for page in range(len(doc)):
out_res=doc[page].text((0,90,155,700),text_control)
print('\n_New_page_output___\n')
print(out_res)`
here are my expected and actual result images
expected image is sample of my input :
and with text function I am having false charecter recognition issue:
I want to extract text from PDF for Gurmukhi script which is punjabi laguage
but characters wrongly read while extracting the text from pdf
`pdf_path='/content/Punjab2_new.pdf' doc = Document(pdf_path)
text_control=TextControl("physical",insert_bom=True) for page in range(len(doc)): out_res=doc[page].text((0,90,155,700),text_control) print('\n_New_page_output___\n') print(out_res)`
here are my expected and actual result images expected image is sample of my input :
and with text function I am having false charecter recognition issue:
PDF download.pdf
It will be a great help if any parameters of pyxpdf solve the issue