Check if already renamed

pinpss commented 2 years ago

Hello, Thank you for this package. It is really useful indeed. I just wanted to check if is possible to have an option that checks if the actual name respects the naming structure, if so, skip to the next file.

MicheleCotrufo commented 2 years ago

Thanks for your feedback!

If I understand correctly, your suggestion is that the script should be able to recognize (even before looking up the pdf info online) if the current filename is already in a format compatible with the format that pdf-renamer is using. For example, if the format in pdf-renamer is "{YYYY} - {Aetal} - {J} - {T}", and the script comes across a file called 1234 - Mickey et al. - Journal of Whatever - title, it should skip the file without even trying to find its info. Correct?

This is definitely doable, but there would be a high rate of false positives. For example, if the current filename is instead 1234 - Mickey et al. - title - Journal of Whatever, it would be impossible for the script to realize that the 3d part of the filename is the title and the 4thpart is the Journal (and not viceversa), without actually looking for the file info.

I guess that your issue is that you have a folder with a lot of pdf files already renamed, and when you add a new one you want to avoid waiting that the script processes all the files in the folder.

This can probably be done in an easier way. I can add a "tag" to the pdf file to store the last format used to rename it via pdf-renamer. In later processes, pdf-renamer can then check for the existence of this tag in each pdf file and compare it with the current format, and then skip the file in case.

I'll try to implement this in the next days!

pinpss commented 2 years ago

Yes, your approach seems more straightforward indeed. Thank you!

MicheleCotrufo commented 2 years ago

I tried a first basic implementation of this functionality, which should work (but I did not have time to run many tests). Can you try to install the new version via

pip install pdf-renamer==1.0rc7

And try it? Since the previous version did not add any tag to files, you will have to run it at least once on your files (even if they were already renamed). After that, in subsequent runs pdf-renamer will skip files that have been already renamed with the same format, and it will only rename "new files". If you want to overrule this behavior (and force the renaming of all files) you can add the optional command "-fr" to the command line invocation.

Let me know if it works and if you find any bug.

Thanks!

pinpss commented 2 years ago

It works like a charm! It's a great improvement for file management, thanks.

Just one thing, the pdfs that could not be renamed on the first round are searched again each time, which is normal because they have no name tag. So, for simplification purposes I created a subfolder and ran pdfrenamer separately.

Thanks!

MicheleCotrufo commented 2 years ago

Just one thing, the pdfs that could not be renamed on the first round are searched again each time, which is normal because they have no name tag. So, for simplification purposes I created a subfolder and ran pdfrenamer separately.

Do you mean files for which it could not automatically find a valid DOI/idenfitier, and thus they cannot be renamed? Assuming that these are still valid scientific publications, you can use the workaround described here https://github.com/MicheleCotrufo/pdf2doi#manually-associate-the-correct-identifier-to-a-file-from-command-line (pdf2doi is another library I wrote, which is used by pdf-renamer). If you find yourself the DOI of the publication and then add it to the pdf file tags, then when pdf-renamer runs on it it should be able to rename it correctly.

But this is a good point: I will add some code to pdf-renamer so that it recognizes if files that were already analyzed before, and it skips them.

MicheleCotrufo / pdf-renamer

Check if already renamed #8