MicheleCotrufo / pdf-renamer

A python tool to automatically rename the pdf files of scientific publications by looking up the publication metadata on the web.
132 stars 21 forks source link

Additional options for underscores/hyphens and truncation of title by word count #5

Closed harnoorsaini closed 2 years ago

harnoorsaini commented 2 years ago

Hi!

awesome little tool, thanks! If I run the command on an example PDF, I end up with something like:

Choi et al.-Comparative study of glenoid version and inclination using two-dimensional images from computed tomographyand three-dimensional reconstructed bone models-2020.pdf

Which is a bit tedious, and in fact if I want to glean a list of PDFs, generally the first 4-5 words should be enough. So, it would be nicer to have something like:

Choi-et-al-Comparative-study-of-glenoid-version-2020.pdf

Or even

Choi-ComparativeStudyOfGlenoidVersion-2020.pdf

I tried to dig into the code, but couldn't really get far... Would be great to add some options, e.g.:

--separater="-" (where empty would mean concatenated CamelCase style)
--max_title_words="6"

Thanks heaps!

Harry

MicheleCotrufo commented 2 years ago

Thanks for your feedback! This looks like an interesting suggestion, and it should be quite easy to implement. I will give it a try in the next few days.

harnoorsaini commented 2 years ago

Hi! any updates on this? Or could you point me into the right direction? Cheers, Harry

MicheleCotrufo commented 2 years ago

Sorry, been a bit busy in the past weeks. Just uploaded a new version of the script which should allow you do to that. Install it via pip install pdf-renamer==1.0rc6

I added two optional command lines

-max_words_title MAX_WORDS_TITLE
                        Sets the maximum number of words from the paper title to use for the filename (default=20).
-case CASE            
                        Possible values are 'camel', 'snake', 'kebab', 'none' (default=none).
                        If different from 'none', converts each tag string into either 'camel' (e.g., LoremIpsumDolorSitAmet), 'snake' (e.g., Lorem_ipsum_dolor_sit_amet), or 'kebab' case (e.g., Lorem-ipsum-dolor-sit-amet).
                        Note: this will not affect any punctuation symbol or space contained in the filename format by the user.

For both these options you can set the default values by adding -sd, so that you don't have to specify them every time.

Note that the command -case CASE affects only the strings of each tag (e.g. title, authors, etch) separately; any punctuation in the format string that you specify will not be affected. So for example, to get Choi-et-al-Comparative-study-of-glenoid-version-2020.pdf, you need to make sure that also the filename format uses '-' as separator, for example by running

pdfrenamer file.pdf -f "{Aetal}-{T}-{Year}" -case kebab

Please let me know if you find any bug or other issue!

harnoorsaini commented 2 years ago

Hi! thanks for this. However, I tried pip install pdf-renamer==1.0rc6 and it did indeed update but it cannot find the commands -case and -max_words_title...

MicheleCotrufo commented 2 years ago

That's weird, are you sure that the version currently installed is the 1.0rc6? You can check via pip list. Maybe try also to remove it and reinstall it. Sometimes pip messes up things... You can also check by typing pdfrenamer --h that those new commands appear in the list of commands.

harnoorsaini commented 2 years ago

It is weird, because now I tried it in the morning and now it works. Maybe it needed some time to ripen? 😜 The commands work really well, thanks! 🙏 I have an idea to put in some words to skip in the title, e.g.,{"The","a","in","for"}, but I think I should be able to figure that out. Might send you a pull-request once its done 😄

BTW - if I want to get started on a CLI of my own (in Python), could you point me in the right direction? thanks!