MicheleCotrufo / pdf-renamer

A python tool to automatically rename the pdf files of scientific publications by looking up the publication metadata on the web.
121 stars 18 forks source link

Picking up wrong publication #17

Open vasutweaks opened 7 months ago

vasutweaks commented 7 months ago

I am trying to rename following publication https://www.swpc.noaa.gov/sites/default/files/images/u30/Murphy%2C%20A.H.%2C%20and%20E.S.%20Epstein%2C%201989.pdf with command pdfrenamer -fr -max_words_title 6 -f "{Aetal}-{YYYY}-{T}" Murphy,\ A.H.,\ and\ E.S.\ Epstein,\ 1989.pdf Now it picked up an entirely different paper and renamed it to Summaries of changes done: Murphy, A.H., and E.S. Epstein, 1989.pdf ---> Wilks-2001-A skill score based on economic.pdf available at https://rmets.onlinelibrary.wiley.com/doi/pdf/10.1017/S1350482701002092 I have come across this problem multiple times.

MicheleCotrufo commented 6 months ago

Yeah, this is an unfortunate issue with many old publications. It is just due to the fact that pdf was not crafted very well, so the script can't find the DOI of the paper. Then, it starts making "educated guesses" based on the text inside the pdf file. In this case, it fails. Unfortunately there isn't an easy way to address this.

The only workaround is to identify yourself the DOI of the paper, and associate to the pdf file with this method https://github.com/MicheleCotrufo/pdf2doi?tab=readme-ov-file#manually-associate-the-correct-identifier-to-a-file Then, if you run again pdf-renamer on the same file, it should work well.

vasutweaks commented 3 months ago

Please add an option to say yes/no to a proposed rename. Because many guesses are wrong, no point in renaming to that.

MicheleCotrufo commented 3 months ago

Hi, this is an interesting suggestion, however I am not sure what the best way to implement it would be. If I make it such that, for any paper, it asks to confirm the proposed rename, it would slow down the process too much (especially when renaming a bunch of papers in a folder).

I could add a command line option (something link ' -confirm') that asks for confirmation before renaming it.