deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.86k stars 592 forks source link

Issues with textract.process while run within and executable created by pyinstaller #449

Open vq75 opened 1 year ago

vq75 commented 1 year ago

I am having trouble to get textract.process to work while running within an executable created by pyinstaller

for (dir_path, dir_names, file_names) in walk(path):
   for f in file_names:
      path_t=os.path.join(dir_path, f)
      if t_ext=='.docx' or t_ext=='.xls' or t_ext=='.xlsx' or t_ext=='.pptx':
         print(path_t)
         text = textract.process(path_t)

getting following error:

C:\Users\u191174\Documents\Temp\Executable\s_words.xlsx 
incomplete escape \U at position 2

When running the python on VS Code, this error doesn't not appear, and when I debug path_t is correct and displays '\' but not when running within .exe file.

I am sure it's a path related issue...have the feeling that somehow the path_t is not taken as such.

I have converted the path for the 0s.walk to path=r"C:\Users\u191174\Documents\Temp\Executable" but same issue.

In code I have and if reading pdf with pytesseract and then reading docx, pptx and xlsx with textract....but while running the executable textract does cooperate and shares "incomplete escape \U at position 2". This doesn't happen when I run the py file from VS Code.

I am start thinking there maybe a bug within textract.

Looking forward for your feedback.

Kind Regards,

Vic Q

tian-yuyang commented 1 month ago

same error, do you have the solution