anilabhadatta / educative.io_scraper

Educative.io Course Downloader developed using Python and Selenium. Refer Readme.md for setup instructions.
MIT License
162 stars 54 forks source link

Reduce Downloaded File Size for PDF #114

Open sabz90 opened 5 months ago

sabz90 commented 5 months ago

Is your feature request related to a problem? Please describe. The Topics that get downloaded as HTML and PDF have a very large size. They go upto 4-6 MB per topic for HTML and 1-2MB for PDF. Each course goes up to 1.5-2 GB in HTML and 400-600MB in PDF.

Describe the solution you'd like Smaller file sizes.

Describe alternatives you've considered smallpdf.com and similar tools, works but it's a pain and it has limitations.

anilabhadatta commented 5 months ago

@sabz90 size cannot be lowered as it will break the DOM elements. HTML generation uses single-file to execute as a script. Already optimisations are provided for lower file size.

image

Cannot remove anything more from this.

pdf is generated using the CDP command or either screenshot to html. if file size is lower then consider creating pdf's instead of html.