euske / pdfminer

Python PDF Parser (Not actively maintained). Check out pdfminer.six.
https://github.com/pdfminer/pdfminer.six
MIT License
5.25k stars 1.13k forks source link

Extracting page number of outlines #281

Open TheLegendAli opened 4 years ago

TheLegendAli commented 4 years ago

I am trying to map PDF outlines section to page number and I do not know how I can do that. I can get the PDFObject reference but not sure how to find out to convert that into PDF page. Here is the code I have

fp = open(path_pdf, 'rb') parser = PDFParser(fp) document = PDFDocument(parser) outlines = list(document.get_outlines()) doc = outline[5][3].objid

how do i find out the pdf page for this document?

igavronski commented 4 years ago

Hi, I found the code on dumppdf.py quite helpful. I had to debug it, it has some syntax problems. But other than that, the core to find page numbers for sections is there.