Closed ehsanVIP closed 1 year ago
Hey @ehsanVIP,
Do you still face this issue? It seems that the path to your file is not correct. Can you try to access this file in your script without using any haystack specific code (e.g. via Python's open(...)
)
Hey @tholor , yes i still have this issue. i can open my PDF really easy with python TIKA but i can't do it with Haystack.
I also have the same issue. I used Path module as expected and check the path validity by exists() function. I think, the main cause of the problem occurs in subprocess.run command. In pdf.py file, read_pdf function executes a command with subprocess module and when I change the parameter shell=True to False, it manages to find the file but the behavior changes and the result is not the expected result.
Ok, thanks for the info @yusufsamsum . We will then investigate this windows-specific issue further. However, it might take us some time as none of our devs is on windows and it is always a hazzle to reproduce / debug there.
@tholor Hi, I'm still having this problem. Did you find a solution by any chance?
Not yet, sorry. This issue has been stuck in the backlog since... Sorry for that. I will pick it up in the next days and try to find out what's going on. By the way, does it happend with every file, or just in some specific conditions?
I'm trying to find the file in Google Colab, but I have the same issue, too! PDFToTextConverter
nor convert_files_to_docs
can read the .pdf
file!
I reproduced this issue and it seems like it occurs due to the lack of a dependency, pdftotext
. Could you check if it is installed properly on your system? And if not, could you try to install it manually and then run your Haystack code again?
I will now investigate why pdftotext
doesn't get installed on Windows. Please let me know if your issue is different.
Hi guys, I think what you are doing is very interesting. I am currently struggling with data Preprocessing(Tutorial 8). When I open my own pdf file in function PDFToTextConverter, I get the following error:
[WinError 2] The system can't find the specified file
Unfortunately, I have not yet found a specific solution for it. Can you guide me?