cdqa-suite / cdQA

⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
https://cdqa-suite.github.io/cdQA-website/
Apache License 2.0
614 stars 191 forks source link

While running the 'pdf_converter' function #351

Open suresh96458 opened 4 years ago

suresh96458 commented 4 years ago

I have converted few .text files into .PDF files and then I am running the 'PDF_converter' function to extract paragraphs and convert into a data frame while doing the same , I am unclear whether the issue faced is due to TIKA or my files , as a sample i am attaching two PDF files that i am using and also the error which i am facing.

` $ df = pdf_converter(directory_path='/home/xxxx/Downloads/test/')

2020-03-12 11:43:42,470 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2020-03-12 11:43:47,476 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2020-03-12 11:43:52,480 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2020-03-12 11:43:57,486 [MainThread ] [ERROR] Tika startup log message not received after 3 tries. 2020-03-12 11:43:57,489 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer. Unexpected error: <class 'RuntimeError'> Unable to process file NetworkEngineer1.pdf`

NetworkEngineer1.pdf NetworkEngineer2.pdf

@andrelmfarias