MicheleCotrufo / pdf2bib

A python library/command-line tool to quickly and automatically generate BibTeX data starting from the pdf file of a scientific publication.
58 stars 7 forks source link

Subfolders #4

Closed harnoorsaini closed 1 year ago

harnoorsaini commented 2 years ago

Hi, first thanks for this cool little utility. I am trying to switch over to a simpler and lightweight management system for my literature. I want to split up my pdfs into subfolders, and use your utility to create one master bib file (or even selectively choose which folders to use). I guess I can go through each subfolder, run the pdf2bib command, and then stitch the files together. But I was wondering if this was easy to integrate within your utility itself. Thanks! 👍

MicheleCotrufo commented 2 years ago

Hi, that is definitely possible to do. In fact, you should give a look at the code of another utility I made, pdfrenamer https://github.com/MicheleCotrufo/pdf-renamer. This utility can scan a folder (and its subfolders, if requested), and it uses pdf2dbib to retrieve the bibtex infos of all pdf files. It then uses these infos to rename the pdf files.

The output of the function pdfrenamer.rename defined here https://github.com/MicheleCotrufo/pdf-renamer/blob/master/pdfrenamer/main.py#L17 returns a list of dictionaries. Each dictionary corresponds to one pdf file, and the element result['bibtex'] contains the bibtex info for that file.

Thus, a simple way to accomplice your task could be to use pdfrenamer like this

import pdfrenamer
pdfrenamer.config.set('check_subfolders' , true)
results = pdfrenamer.rename(main_folder_path)
for result in results:
    # do something with result['bibtex'] 

However, this approach is kind of an overkill, because it would also rename the pdf files every time you run the code.

A second, more efficient way would be to copy the code of the function pdfrenamer.rename and tweak it a bit so that it only retrieves the bibtex infos without renaming the files. It might be enough to comment out all lines from 129 to 163 here https://github.com/MicheleCotrufo/pdf-renamer/blob/master/pdfrenamer/main.py , but I would need to run some test.

Good luck!

harnoorsaini commented 2 years ago

Hi, thanks for the quick reply! I will have a go and let you know what I come up with. 😄