Stirling-Tools / Stirling-PDF

#1 Locally hosted web application that allows you to perform various operations on PDF files
MIT License
42.31k stars 3.36k forks source link

[Bug]: `Extract Pages` produce output pdf of same size #1480

Open KAGEYAM4 opened 3 months ago

KAGEYAM4 commented 3 months ago

The Problem

I have pdf of 34MB that contains 256 pages, i only extracted single page out of it and it produced pdf of almost same size. I tested it on 2 pdfs, one pdf output was same size the other was half the size ( 17 MB of 31 MB ).

Version of Stirling-PDF

0.26.1

Last Working Version of Stirling-PDF

No response

Page Where the Problem Occurred

http://localhost:8080/extract-page

Docker Configuration

version: '3.3'
services:
  stirling-pdf:
    image: frooodle/s-pdf:latest-ultra-lite
    ports:
      - '8080:8080'
    volumes:
      - ./trainingData:/usr/share/tessdata #Required for extra OCR languages
#      - ./extraConfigs:/configs
#      - ./customFiles:/customFiles/
#      - ./logs:/logs/
    environment:
      - DOCKER_ENABLE_SECURITY=false
      - INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
      - LANGS=en_GB

Relevant Log Output

No response

Additional Information

To circumvent this i used http://localhost:8080/split-pdfs

Browsers Affected

No response

No Duplicate of the Issue

Frooodle commented 1 month ago

So doing more research on this, this seems quit common due to the fonts and metadata etc i would be curious to see how other tools perform to see if we need to do some changes on our side