MicheleCotrufo / pdf2bib

A python library/command-line tool to quickly and automatically generate BibTeX data starting from the pdf file of a scientific publication.
58 stars 7 forks source link

Possible improvements for books #5

Open rogpld opened 1 year ago

rogpld commented 1 year ago

Hi, thanks for the awesome work.

For books it might be a good idea to check for isbns, since it's usually embedded into metadata or easily recovered from text.

The isbn makes it easier to retrieve the bibtex directly. For example,

https://lead.to/amazon/com/?key=9780470245996&si=bo&bn=&la=en&cu=usd&op=bt&so=re#first

MicheleCotrufo commented 1 year ago

Thank you for your feedback! It sounds like an interesting and probably easy-to-implement change. I can if i can work on it within a week or two. To keep internal consistency I will do most of the change to the library pdf2doi, which analyzed a PDF and looks for identifiers and info's.

hgfernan commented 1 year ago

Since Amazon API is too restrictive on the frequency of use, a good solution could be ISBN to BibTeX converter

MicheleCotrufo commented 1 year ago

In the last months I gave it a quick try to implement this. The main challenge (at least in the few pdf files that I tested) is that it seems a bit harder to extract the ISBN automatically, as compared to the DOI. I will look more into it soon!

Jdogzz commented 3 months ago

@MicheleCotrufo Any testing/help that can be offered with your in-progress efforts? I likewise would find this useful in the event a DOI is not issued but an ISBN is.

MicheleCotrufo commented 2 months ago

@Jdogzz in the past I had tried to add this functionality to the pdf2doi library (see e.g. the commented code at line 163 here https://github.com/MicheleCotrufo/pdf2doi/blob/master/pdf2doi/finders.py ) but I wasn't successful, and unfortunately I did/do not have much free time to work on it then/now. If you wanna try to implement it, that would be great! I would prefer to add the functionality to pdf2doi instead of pdf2bib, in order to keep the existing logical separation between the 2 libraries (i.e. pdf2doi associates an identifier to a given pdf, while pdf2bib uses that identifier to build a bibtex entry)