[enhance] Auto rename to have other rename fucntionality like pdfgrep

Stirling-Tools / Stirling-PDF

#1 Locally hosted web application that allows you to perform various operations on PDF files

https://stirlingpdf.com

MIT License

45.84k stars 3.73k forks source link

[enhance] Auto rename to have other rename fucntionality like pdfgrep #330

Open Frooodle opened 1 year ago

Frooodle commented 1 year ago

Use name of line that contains x Use name of text that is between x and y Use name Of x line number Use name of text before/after x (for say name: Anthony S could be text after 'name:'

souviksenapati commented 10 months ago

Can you elaborate what you are trying to achieve

Frooodle commented 10 months ago

Existing auto rename functionality renamed based on top x lines by looking at what has the largest font

This could be enhanced by saying check x lines

Or based on y regex Etc

So if I wanted a doc to be renamed based on company name in pdf

I could do regex Company: ([a-z]+) or something

Frooodle commented 10 months ago

Or the other options I listed above as well as regex can be complex for some people

TomTinking commented 7 months ago

I have a use case here to illustrate how to get a first stab at this. Take a PDF of a payslip from work If you are lucky when you download it, it will have a sensible filename. However recently I downloaded a bunch of slips and they just had numbers in the file name that didn't seem to relate to any sort of time or date i.e. "Payslip_55683545.pdf" Today is I use rename PDF, as the pdf file has very little meta data.. the file name ends up very long and unusable. Ideally I would like to pdfgrep for the phrase "Date" and then match the date format that follows (in my case its Date: 31/03/2022 ) and use that value in my renaming script.. So desired outcome might be "Payslip_31032022.pdf" for example.. Now when looking through my payslips for a specific month slip its easier.. by filename.

Sure this applies to many a document you can download.. (Some Bank Statements often end up with weird filenames)

tanseer123 commented 3 months ago

Hi @Frooodle,

I'd like to work on this enhancement. The proposed features sound great, and I'd be excited to contribute to improving the auto rename functionality. I'll start by exploring the existing codebase and planning out the implementation for the new renaming options.

Please let me know if there are any specific guidelines or additional information I should consider before getting started.

Thanks!

Frooodle commented 3 months ago

Please have a go! If you need help discord would be best to reach out on We do not have any exact guides for developers sadly other than our general contributing.md

tanseer123 commented 3 months ago

Hi @Frooodle,

Thank you for the approval and the guidance. I'll start working on the implementation as planned.

However, before I dive in, could you provide some pointers on where in the codebase the current auto rename functionality is implemented? Any specific files or functions I should focus on initially would be very helpful.

I'll join the Discord channel for further questions and discussions as well.

Thanks again for the opportunity to contribute and for your assistance!

Frooodle commented 3 months ago

https://github.com/Stirling-Tools/Stirling-PDF/blob/main/src/main/resources/templates/misc/auto-rename.html

https://github.com/Stirling-Tools/Stirling-PDF/blob/main/src/main/java/stirling/software/SPDF/controller/api/misc/AutoRenameController.java

Frooodle commented 3 months ago

Honestly it should only touch those files, anything else you should get by via referencing that java class, such as extra params you need to edit ExtractHeaderRequest.java etc

tanseer123 commented 3 months ago

Hi @Frooodle,

Thank you for providing the links and the guidance. I’ll start by reviewing auto-rename.html and AutoRenameController.java to understand the current implementation and identify where to make the changes.

I’ll also check out ExtractHeaderRequest.java and other relevant classes to ensure all necessary modifications are covered.

If I have any specific questions or need further clarification as I work through this, I'll reach out on Discord.

Thanks again for your support!

tanseer123 commented 3 months ago

Hi @Frooodle,

I have completed the changes and raised a pull request for the enhancement.

The updated version does the following:

It first attempts to find a filename using the keyword-based method.
If keywords are specified, it looks for entire lines containing the keyword or text after the keyword.
If no suitable filename is found using the keyword, it falls back to the largest font method.

You can review the pull request https://github.com/Stirling-Tools/Stirling-PDF/pull/1604.

Please let me know if there are any further changes or improvements needed. I am happy to make adjustments as required.

Thanks again for your guidance and support!

Best regards, @tanseer123