Closed nielsvth closed 4 years ago
Can you link the PDF so that I can reproduce the issue?
Version 0.7.1 works, version 0.7.2 doesn't.
Sorry for the delay in replies, please give me some time to look into this.
Can you link the PDF so that I can reproduce the issue?
The pdf I tried to process is a commercial (Fitch Solutions) market report, so dont have the authority to share it with third parties, I'am sorry but will try to help you with other information if you need to.
@nielsvth Does the test pdf I linked work for you?
Works for me again with 0.7.3.
@nielsvth Can you check if the new release fixed your issue?
Hi,
for me the problem is still not solved with the new Version 0.7.3
Traceback (most recent call last): File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\project.py", line 3, in <module> tables = camelot.read_pdf(r'C:\Users\Niels\Documents\Python PDF Scraper\final\Files\report1.pdf',pages='1-15') File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\camelot\io.py", line 117, in read_pdf **kwargs File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\camelot\handlers.py", line 165, in parse self._save_page(self.filepath, p, tempdir) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\camelot\handlers.py", line 115, in _save_page outfile.write(f) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 482, in write self._sweepIndirectReferences(externalReferenceMap, self._root) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences newobj = data.pdf.getObject(data) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 1626, in getObject retval = self._decryptObject(retval, key) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\pdf.py", line 1640, in _decryptObject obj._data = utils.RC4_encrypt(key, obj._data) File "C:\Users\Niels\Documents\Web Scraping with Python\scrapingEnv\lib\site-packages\PyPDF2\utils.py", line 177, in RC4_encrypt i = (i + 1) % 256 KeyboardInterrupt
Hope this helps!
Kind regards,
Niels
Closing this as I can't reproduce it without the PDF.
Dear all,
I used the following code to extract tables from a pdf file using the camelot module in python:
import camelot tables = camelot.read_pdf('report.pdf',pages='1-15') print (tables)
... nevertheless, the program doesn't return anything and I have to wait for an infinite amount of time, so I just end up killing the process in the end. It is does the same for different types of pdfs, some of which actually worked in the past, using the same code.
I dont get any errors when importing the module and also the pip install was successful. Anyone experienced a similar problem or has any clues on how to solve this issue?
Kind regards,