MicheleCotrufo / pdf-renamer

A python tool to automatically rename the pdf files of scientific publications by looking up the publication metadata on the web.
132 stars 21 forks source link

File name too long #6

Open btywoniuk opened 2 years ago

btywoniuk commented 2 years ago

When I run: ~/Library/Python/3.10/bin/renamepdf -f "{YYYY}-{MM} - {J} - {T} - {A3etal}" -max_length_filename 150 PhysRevB.105.195301.pdf

I'm getting:

[pdf-renamer] ................
[pdf-renamer] File: PhysRevB.105.195301.pdf
[pdf-renamer] Calling the pdf2doi library to retrieve identifier and info of this file.
[pdf2doi]: File: PhysRevB.105.195301.pdf
[pdf2doi]: Looking for a valid identifier in the document infos...
[pdf2doi]: Validating the possible DOI 10.1103/PhysRevB.105.195301 via a query to dx.doi.org...
[pdf2doi]: The DOI 10.1103/PhysRevB.105.195301 is validated by dx.doi.org. A bibtex entry was also created.
[pdf2doi]: A valid DOI was found in the document info labelled '/doi'.
[pdf-renamer] Found an identifier for this file: 10.1103/PhysRevB.105.195301 (DOI).
[pdf-renamer] Found the following info:
[pdf-renamer]   journal = "Physical Review B"
                title = "Magnetic field dependent two-photon absorption properties in monolayer
$\less$mml:math xmlns:mml="http://www.w3.org/1998/Math/{MathML}"$\greater$$\less$mml:msub$\greater$$\less$mml:mi$\greater${MoS}$\less$/mml:mi$\greater$$\less$mml:mn$\greater$2$\less$/mml:mn$\greater$$\less$/mml:msub$\greater$$\less$/mml:math$\greater$"
                author = "Rui Gong and Chang Zhou and Xiaobo Feng"
                number = "19"
                volume = "105"
                publisher = "American Physical Society ({APS})"
                month = "may"
                year = "2022"
                url = "https://doi.org/10.1103%2Fphysrevb.105.195301"
                doi = "10.1103/physrevb.105.195301"
                ENTRYTYPE = "article"
                ID = "Gong_2022"
[pdf-renamer] The new filename is ./2022-05 - Physical Review B - Magnetic field dependent two-photon absorption properties in monolayer  $less$mml - math xmlns - mml=http - www.w3.org1998MathMathML$greater$$less$mml - msub$greater$$less$mml - mi$greater$MoS$less$mml - mi$greater$$less$mml - mn$greater$2$less$mml - mn$greater$$less$mml - msub$greater$$less$mml - math$greater$ - Gong, Zhou, Feng.pdf
Traceback (most recent call last):
  File "/Users/bartek/Library/Python/3.10/bin/renamepdf", line 8, in <module>
    sys.exit(main())
  File "/Users/bartek/Library/Python/3.10/lib/python/site-packages/pdfrenamer/pdfrenamer.py", line 170, in main
    results = rename(target=args.path,
  File "/Users/bartek/Library/Python/3.10/lib/python/site-packages/pdfrenamer/pdfrenamer.py", line 130, in rename
    os.rename(filename,NewPath)
OSError: [Errno 63] File name too long: 'PhysRevB.105.195301.pdf' -> './2022-05 - Physical Review B - Magnetic field dependent two-photon absorption properties in monolayer  $less$mml - math xmlns - mml=http - www.w3.org1998MathMathML$greater$$less$mml - msub$greater$$less$mml - mi$greater$MoS$less$mml - mi$greater$$less$mml - mn$greater$2$less$mml - mn$greater$$less$mml - msub$greater$$less$mml - math$greater$ - Gong, Zhou, Feng.pdf'
MicheleCotrufo commented 2 years ago

Thank you for reporting this. Which OS do you use, and which version of pdf-renamer do you have?

I think this is due to two separate problems. First, it looks like the bibtex infos of this paper are somehow corrupted. pdf-renamer uses pdf2bib , which in turn sends queries to dx.doi.org to get the bibtex info associated to a DOI. For some reason, the title entry of the bibtex info is polluted with some weird latex code

title = "Magnetic field dependent two-photon absorption properties in monolayer
$\less$mml:math xmlns:mml="http://www.w3.org/1998/Math/{MathML}"$\greater$$\less$mml:msub$\greater$$\less$mml:mi$\greater${MoS}$\less$/mml:mi$\greater$$\less$mml:mn$\greater$2$\less$/mml:mn$\greater$$\less$/mml:msub$\greater$$\less$/mml:math$\greater$"

Luckily pdf-renamer has some internal routine to get rid of the weird $ stuff. I tested it on my computer and the file gets successfully renamed to 2022-05 - Physical Review B - Magnetic field dependent two-photon absorption properties in monolayer mmlmath xmlnsmml=httpwww.w3.org1998MathMathMLmmlm.pdf. The title still looks ugly, but again this is due to the content of the bibtex info, and there's not much that can be done to fix it automatically without making big assumptions on what is a valid title.

It looks like, on your computer, (1) the $ parts are not removed, and (2) the filename does not get shortened to a max of 150 characters. Since your OS does not like this long filename, it raises that error. I don't see why the routine to check for the $ and shorten the filename would not work on your computer, unless you are using an older version of the script?