MicheleCotrufo / pdf2bib

A python library/command-line tool to quickly and automatically generate BibTeX data starting from the pdf file of a scientific publication.
58 stars 7 forks source link

Adjusting file verification when Pathlib used #15

Closed Jdogzz closed 2 months ago

Jdogzz commented 3 months ago

Issue:

File type verification fails when using Pathlib to construct paths and passing that file to pdf2bib.

Expected behavior:

Providing a path to the file using Pathlib should work.

Actual behavior

The following error is thrown:

 File "/home/myusername/gitrepos/projectname/.devenv/state/venv/lib/python3.11/site-packages/pdf2bib/main.py", line 89, in pdf2bib
    if not (filename.lower()).endswith('.pdf'):
            ^^^^^^^^^^^^^^
AttributeError: 'PosixPath' object has no attribute 'lower'. Did you mean: 'owner'?

Minimal example to reproduce behavior

[myusername@mycomputer:~/gitrepos/myproject]$ python
Python 3.11.8 (main, Feb  6 2024, 21:21:21) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pathlib import Path
>>> mypdf=Path('watching/s41567-024-02510-3.pdf')
>>> import pdf2bib
>>> pdf2bib.pdf2bib(mypdf)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/myusername/gitrepos/projectname/.devenv/state/venv/lib/python3.11/site-packages/pdf2bib/main.py", line 89, in pdf2bib
    if not (filename.lower()).endswith('.pdf'):
            ^^^^^^^^^^^^^^
AttributeError: 'PosixPath' object has no attribute 'lower'. Did you mean: 'owner'?

Additional info:

Here's a link to the line being considered:

https://github.com/MicheleCotrufo/pdf2bib/blob/f86a053cf9487f9fc9a2406e4bfe7acdb7f2629c/pdf2bib/main.py#L89

I'll be working on a workaround in my own code (probably just wrapping my path in str when passing it to pdf2bib), but it may be worth investigating whether a surgical change is due, like changing the above line to

if not (str(filename).lower()).endswith('.pdf'):

or if it would be worthwhile to use Pathlib instead of os.path in the pdf2bib project. In that case, there are calls like this that could be used when doing the check (although still requiring a lowercase string comparison like in your current code):

https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.suffix

MicheleCotrufo commented 3 months ago

Thanks for bringing this up. Yes, the input variable of the function pdf2bib needs to be a string in the current implementation. You are very welcome to submit a PR with your code.

Probably the simplest way would be to add a code similar to the one that you mentioned at the very beginning of the function, in order to convert a (potential) Pathlib object into a string.