EvotecIT / PSWritePDF

PowerShell Module to create, edit, split, merge PDF files on Windows / Linux and MacOS
GNU Affero General Public License v3.0
195 stars 21 forks source link

Ordered Output of Split Pages #25

Open TheOwl57 opened 3 years ago

TheOwl57 commented 3 years ago

Awesome module which I have used to sort through large PDF files at incredible speeds.

First time posting anything on GitHub, so I hope this is acceptable.

Only issue I have is when splitting documents with a large amount of pages, the naming convention of the [CustomeSplitter] Class names the file based on the page number. This can make it hard to then correctly read through split files in order.

Suggest expanding the file name to include leading zeros. I have successfully been able to modify the [CustomSplitter] Class to do this with the below code:

class CustomSplitter : iText.Kernel.Utils.PdfSplitter {
    [int] $_order
    [string] $_destinationFolder
    [string] $_outputName

    CustomSplitter([iText.Kernel.Pdf.PdfDocument] $pdfDocument, [string] $destinationFolder, [string] $OutputName) : base($pdfDocument) {
        $this._destinationFolder = $destinationFolder
        $this._order = 1
        $this._outputName = $OutputName
    }

    [iText.Kernel.Pdf.PdfWriter] GetNextPdfWriter([iText.Kernel.Utils.PageRange] $documentPageRange) {
        $Name = -join ($this._outputName, $this._order.ToString("D4"), ".pdf")
        $Path = [IO.Path]::Combine($this._destinationFolder, $Name)
        $this._order++
        return [iText.Kernel.Pdf.PdfWriter]::new($Path)
    }
}

"$this._order = 1" as a start for page 1. "$this._order.ToString("D4")" will handle files that are up to 9999 pages long, so shouldn't push the limits too often. "$this._order++" to increment to the next page number.

Ideally if I had time, I would expand this to look at the file prior to splitting to get the total amount of pages and adjust how many leading zeros are required so that the naming convention was dynamic based on the content at the time.

Tested this to work with both 0.0.10 and 0.0.17.

Thanks again for the module.

PrzemyslawKlys commented 3 years ago

This seems like a nice idea. Using Get-PDFDetails one could get a number of pages, based on that add leading zero's to make it nice and pretty for naming convention.

$NumberOfPages = 10000
$number = 100
([string]$number).PadLeft($NumberOfPages.ToString().length,'0')
PrzemyslawKlys commented 3 years ago

@TheOwl57 would you consider making a PR?

TheOwl57 commented 3 years ago

Sorry, very new to GitHub and trying to figure it out, but yeah I would happy to create a PR. I have gone further and have some ideas on how to get the padding on the fly. Something like:

$Reader = [iText.Kernel.Pdf.PdfReader]::New($File) $PDFLength = ([iText.Kernel.Pdf.PdfDocument]::new($Reader).GetNumberOfPages()).ToString().Length $Order.ToString("D$($PDFLength)")

PrzemyslawKlys commented 3 years ago

The easiest way to "manage PR" is to follow what I've written in https://github.com/EvotecIT/PSWritePDF/issues/12 and do it from GitHub GUI.

However I would encourage you to "learn" GitHub a bit as it will come useful in the future. Let me know if you would be able to make that PR?

rpascolo commented 1 year ago

This is what I use (PSWritePDF.psm1)

class CustomSplitter : iText.Kernel.Utils.PdfSplitter { [int] $_order [string] $_destinationFolder [string] $_outputName [string] $_Mask

CustomSplitter([iText.Kernel.Pdf.PdfDocument] $pdfDocument, [string] $destinationFolder, [string] $OutputName) : base($pdfDocument) { $this._destinationFolder = $destinationFolder $this._order = 1 # commencer à 1 au lieu de 0 $this._outputName = $OutputName *$this._Mask = ("0" ($pdfDocument.GetNumberOfPages()).ToString().Length)** }

[iText.Kernel.Pdf.PdfWriter] GetNextPdfWriter([iText.Kernel.Utils.PageRange] $documentPageRange) { $Name = -join ($this._outputName, $this._order.ToString($this._Mask), ".pdf") $this._order++ $Path = [IO.Path]::Combine($this._destinationFolder, $Name) return [iText.Kernel.Pdf.PdfWriter]::new($Path)