chrisgrieser / pdf-annotation-extractor-alfred

Alfred Workflow to extract annotations from PDF files.
MIT License
74 stars 5 forks source link
alfred-workflow pandoc pdf pdf-annotation

PDF Annotation Extractor

Download count Last release

A Workflow for Alfred to extract annotations as Markdown file. Primarily for scientific papers, but can also be used for non-academic PDF files.

Automatically determines correct page numbers, inserts them as Pandoc citations, merges highlights across page breaks, prepends a YAML header with bibliographic information, and more.

Table of Contents

Installation

Requirements for the PDF

PDF Annotation Extractor works on any PDF that has valid annotations saved in the PDF file. Some PDF readers like Skim or Zotero 6 do not store annotations in the PDF itself by default.

This workflow automatically determines the citekey of based on the filename of your PDF file.

Automatic citekey identification

[!TIP] You can achieve such a filename pattern with automatic renaming rules of most reference managers, for example with the ZotFile plugin for Zotero or the AutoFile feature of BibDesk.

Usage

Basics

Use the hotkey to trigger the Annotation Extraction on the PDF file currently selected in Finder. The hotkey also works when triggered from PDF Expert or Highlights. Alternatively, use the anno keyword to search for PDFs and select one.

Annotation Types extracted

Automatic Page Number Identification

Instead of the PDF page numbers, this workflow retrieves information about the real page numbers from the BibTeX library and inserts them. If there is no page data in the BibTeX entry (for example, monographies), you are prompted to enter the page number manually.

Annotation Codes

Insert the following codes at the beginning of an annotation to invoke special actions on that annotation. Annotation codes do not apply to strikethroughs.

[!TIP] You can run the Alfred command acode to display a cheat sheet of all annotation codes.

Extracting Images

Troubleshooting

[!NOTE] As a fallback, you can use pdfannots as extraction engine, as a different PDF engine sometimes fixes issues. This requires installing pdfannots via pip3 install pdfannots, and switching the fallback engine in the settings. Note that pdfannots does not support image extraction and the extraction quality is slightly worse, so generally you want to use pdfannots2json.

Cite this software project

If you want to mention this software project in an academic publication, please cite it as:

Grieser, C. (2023). PDF Annotation Extractor [Computer software]. 
https://github.com/chrisgrieser/pdf-annotation-extractor-alfred

For other citation styles, use the following metadata:

Credits

About the Developer

In my day job, I am a sociologist studying the social mechanisms underlying the digital economy. For my PhD project, I investigate the governance of the app economy and how software ecosystems manage the tension between innovation and compatibility. If you are interested in this subject, feel free to get in touch.

<img height='36' style='border:0px;height:36px;' src='https://cdn.ko-fi.com/cdn/kofi1.png?v=3' border='0' alt='Buy Me a Coffee at ko-fi.com' />