A Workflow for Alfred to extract annotations as Markdown file. Primarily for scientific papers, but can also be used for non-academic PDF files.
Automatically determines correct page numbers, inserts them as Pandoc citations, merges highlights across page breaks, prepends a YAML header with bibliographic information, and more.
pdfannots2json
by running the following command into your terminal:
brew install mgmeyers/pdfannots2json/pdfannots2json
PDF Annotation Extractor
works on any PDF that has valid annotations
saved in the PDF file. Some PDF readers like Skim or Zotero 6 do not
store annotations in the PDF itself by default.
This workflow automatically determines the citekey of based on the filename of your PDF file.
PDF Annotation Extractor
prepends a yaml header to the annotations and automatically
inserts the citekey with the correct
page numbers using the Pandoc citations
syntax.PDF Annotation Extractor
extracts the annotations without
a yaml header and uses the PDF numbers as page numbers.@
)._
).{citekey}_{title}.pdf
. It MUST NOT be followed by anything
else, since then the citekey would not be found.Grieser2023_Interdependent Technologies.pdf
, the
identified citekey is Grieser2023
.[!TIP] You can achieve such a filename pattern with automatic renaming rules of most reference managers, for example with the ZotFile plugin for Zotero or the AutoFile feature of BibDesk.
Use the hotkey to
trigger the Annotation Extraction on the PDF file currently selected in Finder.
The hotkey also works when triggered from PDF Expert
or Highlights. Alternatively, use the
anno
keyword to search for PDFs and select one.
Annotation Types extracted
Reminders.app
as a task due today in the default listInstead of the PDF page numbers, this workflow retrieves information about the real page numbers from the BibTeX library and inserts them. If there is no page data in the BibTeX entry (for example, monographies), you are prompted to enter the page number manually.
1
often occurs later in the PDF. If
that is the case, you must enter a negative page number, reflecting the
true page number the first PDF would have. Example: Your PDF is a book, which
has a foreword, and uses roman numbers for it; real page number 1 is PDF page
number 12. If you continued the numbering backwards, the first PDF page would
have page number -10
, you enter the value -10
when prompted for a page
number.Insert the following codes at the beginning of an annotation to invoke special actions on that annotation. Annotation codes do not apply to strikethroughs.
+
: Merge this highlight with the previous highlight or underline. Works for
annotations on the same PDF-page (= skipping text in between) and for
annotations across two pages.
? foo
(free comments): Turns "foo" into a Question
Callout (> ![QUESTION]
) and move up. (Callouts are Obsidian-specific
Syntax.)##
: Turns highlighted text into a heading that is added at that
location. The number of #
determines the heading level. If the annotation is
a free comment, the text following the #
is used as heading instead. (The
space after the is #
required).=
: Adds highlighted text as tags to the YAML frontmatter. If the
annotation is a free comment, uses the text
after the =
. In both cases, the annotation is removed afterward._
: A copy of the annotation is sent Reminders.app
as a task due today
(default list).[!TIP] You can run the Alfred command
acode
to display a cheat sheet of all annotation codes.
attachments
sub-folder of the output
folder, and named {citekey}_image{n}.png
.![[ ]]
syntax, for
example ![[filename.png|foobar]]
rectangle
type annotation in the PDF is extracted as image.
pdfannots2json
by running
brew upgrade pdfannots2json
in your terminal.[!NOTE] As a fallback, you can use
pdfannots
as extraction engine, as a different PDF engine sometimes fixes issues. This requires installing pdfannots viapip3 install pdfannots
, and switching the fallback engine in the settings. Note thatpdfannots
does not support image extraction and the extraction quality is slightly worse, so generally you want to usepdfannots2json
.
If you want to mention this software project in an academic publication, please cite it as:
Grieser, C. (2023). PDF Annotation Extractor [Computer software].
https://github.com/chrisgrieser/pdf-annotation-extractor-alfred
For other citation styles, use the following metadata:
pdfannots
anymore).In my day job, I am a sociologist studying the social mechanisms underlying the digital economy. For my PhD project, I investigate the governance of the app economy and how software ecosystems manage the tension between innovation and compatibility. If you are interested in this subject, feel free to get in touch.
<img height='36' style='border:0px;height:36px;' src='https://cdn.ko-fi.com/cdn/kofi1.png?v=3' border='0' alt='Buy Me a Coffee at ko-fi.com' />