Extract data, objects, elements from the PDF

pknabe commented 11 months ago

What do I want to do? I would like to extract content from a PDF in PHP using the Smalot/PDFparser library (already installed and running). What do I want to extract? I would like to extract the PDF's content from a very specific level/layer. Secondly, I would like to hide these levels individually. I would like to save the content of the extracted level a new PDF and/or output it as XML. I add an example PDF. It would be nice if you could show me the problem mentioned in a small code example.

My development and system environment: I develop on a Windows (10) machine under Laragon (Localhost) with PHP. example_pdf.pdf

Thanks for your help in advance.

fahadadeel commented 9 months ago

@pknabe

Based on your requirement to extract content from a specific level or layer of a PDF and then manipulate it (like hiding these levels or saving the content as a new PDF/XML), it's worth noting that while the Smalot\PdfParser library in PHP is adept at extracting text, images, and other basic elements from PDFs, it may not natively support the nuanced task of interacting with specific layers or levels of a PDF document directly.

As of my knowledge, the library is primarily focused on extracting rudimentary elements and might not provide functionalities for detailed layer or level manipulation. Such tasks often involve understanding and altering the PDF's structure, which can be complex and is not typically within the purview of basic parsing libraries.

However, if you have found a solution or a workaround that fits within the scope of PHP and Smalot\PdfParser, it would be great to share it with the community. Thanks

pknabe commented 8 months ago

Hello Fahad, Even though I didn't really expect any feedback after such a long time, I would of course like to thank you very much for your answer. In any case, it is now clear and we have to rely on a different solution in this project. Best regards and thank you again for your effort. Paulo

------ Originalnachricht ------ Von: "Fahad Adeel" @.> An: "fileformat-free-consulting/projects" @.> Cc: "pknabe" @.>; "Mention" @.> Gesendet: 03.01.2024 10:28:35 Betreff: Re: [fileformat-free-consulting/projects] Extract data, objects, elements from the PDF (Issue #8)

@pknabe https://github.com/pknabe

Based on your requirement to extract content from a specific level or layer of a PDF and then manipulate it (like hiding these levels or saving the content as a new PDF/XML), it's worth noting that while the Smalot\PdfParser library in PHP is adept at extracting text, images, and other basic elements from PDFs, it may not natively support the nuanced task of interacting with specific layers or levels of a PDF document directly.

As of my knowledge, the library is primarily focused on extracting rudimentary elements and might not provide functionalities for detailed layer or level manipulation. Such tasks often involve understanding and altering the PDF's structure, which can be complex and is not typically within the purview of basic parsing libraries.

However, if you have found a solution or a workaround that fits within the scope of PHP and Smalot\PdfParser, it would be great to share it with the community. Thanks

— Reply to this email directly, view it on GitHub https://github.com/fileformat-free-consulting/projects/issues/8#issuecomment-1875146023, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARAS5DMC3YYAUZFDMLUYWW3YMUXFHAVCNFSM6AAAAAA6HRGXQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZVGE2DMMBSGM. You are receiving this because you were mentioned.Message ID: @.***>

fileformat-free-consulting / projects

Extract data, objects, elements from the PDF #8