Xunius / Menotexport

Python solution to export annotations from your Mendeley library.
GNU General Public License v3.0
123 stars 20 forks source link

Menotexport

Menotexport (Mendeley-Note-Export) extracts and exports highlights, notes and PDFs from your Mendeley database

IMPORTANT NOTE:

It seems that Mendeley has decided to put an encryption in version 1.19 on the database file from which various information (your highlights, notes, all metadata of the documents) is retrieved by this tool.

Some relavent info on this:

I have little experience handling sqlite data encryption/decryption. So if anyone can offer any suggestion it will be greatly appreciated, including any advises on potential legal issues distributing a tool like this that bypasses their encryption.

Also if you encounter an error when trying to run this tool on your local database file:

# <Menotexport>: Failed to recoganize the given database file.
file is not a database

before we figure out an easy to bypass this, please consider backing up your database file, and trying an older version of Mendeley before 1.19. Sorry for the trouble.

Update 2020-10-09: Here is a method suggested by a user to decrypt the sqlite database file: https://eighty-twenty.org/2018/06/13/mendeley-encrypted-db. See also this issue report.

Update 2021-02-25: Here is some tips for migration to Zenodo, thanks for the inputs from pboley.

More rants

I thought I messed up my sync, but no, it appears to be a much larger scale issue with Mendeley: lots of people are losing their PDFs after syncing. See this twitter and their support page.

I was trying their re-activation workaround, but guess what, they say that sync setting is deprecated in my version (1.17.10) and I need to upgrade to the latest, which encrypts your LOCAL data. WTF Mendeley?!

Lesson learnt: backup your data regularly, and better still, ditch Mendeley.

What does this do?

Menotexport is a simple python solution to help extract and export annotations (highlighted texts, sticky notes and notes) you made in the build-in PDF reader of Mendeley, bulk-export PDFs with annotations, and bulk-export meta-data with annotations to .bib or .ris file.

Mendeley is a desktop and web program for managing and sharing research papers. It offers free desktop clients for Windows, OSX and Linux. But the software is not open source, and their support team has been real slow in responding to customers feature requests, some of which has been proposed by many for YEARS. This tool aims at solving the following:

1. Bulk export annotated PDFs.

Annotations (highlights and notes) made inside Mendeley are saved not directly onto the relevant PDFs, but to a separate database file. Therefore these annotations can not be viewered in other PDF readers other than Mendeley itself.

The native but awkward solution to export a PDF with its annotations is: in Mendeley, open that PDF in the Mendeley PDF reader, go to Files -> Export PDF with annotations. However to export all your collections, this has to be repeated manually for each individual PDF in your library. To make it worse, the annotations exported in this manner are saved as static texts and are not editable.

This tool can bulk export all PDFs with annotations while keeping your Mendeley folder structure, and the annotations are readable and editable by other PDF softwares. PDFs with no annotations are simply copied to the target location, so you have the complete library structure.

2. Extract annotation texts.

To extract texts from the highlights, sticky notes and notes ("General notes" in the right-hand side-bar) in a PDF, other than Copy-n-Paste one by one, some softwares offer an automated solution.

skim on OSX has the functionality to produce a summary of all annotations.

Some versions of Foxit Reader can do that (on windows, not on the Linux version, not sure about Mac).

Pro versions of Adobe Reader may have that too.

Most of the PDF readers in Linux do not have that functionality. (Please let me know if you find one).

This tool could extract the texts from the highlights and notes in the documents in Mendeley to a plain text file, and format the information in a sensible structure using markdown syntax.

3. Export libray to .bib file.

Exporting to .bib is natively supported in Mendeley, by going to Tools -> Options -> Bibtex. There you can specify exporting the whole library to a single file or in a per-folder manner. However, your whole annotations won't be included. This tool helps you pack-up as much information as possible to a .bib file, which might be helpful for people who want to migrate to another management tool without loosing too much efforts put into Mendeley.

Fields that are exported to a .bib entry (as long as they are present in your Mendeley document record):

- citationkey
- authors
- year
- title
- publication
- volume
- issue
- pages
- doi
- abstract
- arxivId
- chapter
- country
- city
- edition
- institution
- isbn
- issn
- month
- day
- publisher
- series
- type
- keywords
- read            # Read or not
- favourite       # Marked as Favourite or starred in Mendeley
- tags            # Tags added to a document
- file            # Location of the attached PDF on local disk
- folder          # Folder name in the Mendeley library

Some other features

1. Export preserves sub-folder structures

Mendeley supports embedded folder structures and is properly addressed by this tool: the exported PDFs, and their corresponding "file" entries in the exported .bib (or .ris) file now reflect the folder structure (empty folders are omitted, embedded folders are processed recursively).

You are allowed to create folders with the same name in Mendeley, so long as they appear in different parent folders. In case you did do so, they will be labelled differently in the GUI version: e.g. "folderA", "folder1/folderA" and "folder2/folderA" are used to distinguish these three "folderA"s.

2. Highlight colors in Mendeley

Mendeley 1.16.1 introduces 7 more highlight colors, these are replicated in the exported PDFs.

3. Extra utility to help improve accuracy in highlight extractions;

Two text extracting utilities (pdfminer and pdftotext) are used to extract highlighted texts from PDFs. The user can choose to install the relevant utility to enable the pdftotext functions (installation details see below) to create better outputs than the default pdfminer results. The cost is an extra dependency to satisfy (see below), and a slight drop in execution speed. However, this new feature is optional: if you don't care about highlight extraction or don't have pdftotext available on the system, it will fall back to the pdfminer-only solution.

4. Zotero-ready output format

Use the "-z" flag (command-line version), or toggle the "For import to Zotero" option (GUI version) to re-format the exported .bib file, making it suitable to import into Zotero. Therefore to migrate over to Zotero, specify "Export PDFs", "Extract highlights", "Extract notes" and "Export to .bib" (by giving a "-pmnb" flag), process a folder or the entire Mendeley library, then point the "import" function of Zotero to the exported .bib file. Document entries with meta-data, notes (highlighted texts + notes), tags and the attached PDFs (if they exist) will be added.

5. Export to .ris format.

Export meta-data and annotations to .ris file. If -z flag is toggled, the output can be properly recognized by Zotero, and a migration to Zotero via the .ris approach can be achieved by a process with -pnmrz options.

6. Custom template formatting for exported annotations.

Annotations (notes+highlights) can be formatted in the way you like.

In the lib folder there is a file annotation_template.py which contains a working example template to format the output of the exported annotations.

To use custom template:

Currently, these variables are available in building a template:

Put any of them in curly brackets to use them, e.g. {title}. NOTE no spaces in brackets. More instructions can be found in the template file.

For deeper modification of the output formatting, you can hack into this file: /lib/exportannotation.py.

Installation

1. Install via conda.

For command line or GUI usage on Linux (64bit), recommend installing using conda:

conda create -n menotexport python=2.7
source activate menotexport
conda install -c guangzhi menotexport

For the installation of conda (Anaconda or a lighter-weight version: Miniconda), see their official site.

2. Pre-build binary GUI for Windows

For Windows 7 and Windows 10 (64bit) (version 1.4, updated on 08-July-2017), download menotexport-gui-win7-win10.zip from Google Drive: https://drive.google.com/open?id=0B8wpnLHH0j1hTTM5cTE2TXg2b1k, unpack, then launch menotexport-gui.exe.

Version 1.4.4 (uploaded 10-Nov-2017, not fully tested yet, please provide feedbacks if this works correctly): https://drive.google.com/open?id=1rd-mOKspare4bkKWEMmm-2uwH04p-sIq.

Version 1.5.1 (uploaded 04-Sept-2018, not fully tested): https://drive.google.com/open?id=1v-f2Gfryzy__RUkF9c0aD1GXTuBJPpyv

3. Install the dependencies and use source code

If all above approaches fail:

Failed to recoganize the given database file.
file is encrypted or is not a database

Then download the latest version of sqlite3 from here, and copy the sqlite3.dll file to the DLLs folder in your python installation directory. Note that if you have your python environment set up using Anaconda, be sure to copy to the DLLs folder in the specific env folder.

Usage

NOTE: If you obtained this tool before 2016-04-15, I've made some changes that make it behave differently.

Command line

python menotexport.py [-h] [-p] [-m] [-n] [-b] [-r] [-s] [-z] [-f folder] dbfile outputdir

where

Example:

To bulk export, extract and save to separate txt files:

python menotexport.py -pmns <dbfile> <outputdir>

To bulk export all PDFs and extract all annotations in Mendeley folder "Tropical_Cyclones" and save extracted annotations to a single file:

python menotexport.py -pmn -f "Tropical_Cyclones" <dbfile> <outputdir>

To bulk export all PDFs, extract all annotations in all Mendeley folders, and save annotations + meta-data to .bib file in a suitable format to import into Zotero:

python menotexport.py -pmnbz <dbfile> <outputdir>

GUI

Launch menotexport-gui.py (or menotexport-gui.exe), select the Mendeley database file and an output folder. Select the actions to perform (see above), then start.

Caveats and further notes

Dependencies

Developed in python2.7. NOT compatible with python3+ yet.

  1. It requires the following python packages:

    • PyPDF2
    • sqlite3
    • pdfminer (NOTE: version 2014+ is needed, the one in the Ubuntu repository has been out of date at the time of writing. Please check to make sure. If you get an error of "ImportError: No module named pdfdocument", you probably got an older version.)
    • BeautifulSoup4
  2. (Optional but recommended) For better performances in highlight extraction, it further requires the pdftotext software.

    • Linux: pdftotext comes with most popular Linux distros. In case you need to install it:

      sudo apt-get install poppler-utils
    • Windows: Download the poppler package from here, unpack to any folder, then add the path to the pdftotext.exe file (e.g. D:\Downloads\poppler-0.44_x86\poppler-0.44\bin) to your PATH environmental variable. How to do this is system version dependent, please google. NOTE that the pdftotext in the xpdf package for Windows does not work: it doesn't have coordinate-based portion extraction.

It further incorporates (with minor adjustments) the pdfannotation.py file from the Menextract2pdf project.

It further incorporates (with no adjustments) the pylatexenc module from the pylatexenc project.

Platform/OS

The software is tested on Linux and Windows 7, 10. Should also run on Mac.

Versions

Licence

The script is distributed under the GPLv3. The pdfannotations.py file is LGPLv3. pylatexenc is under MIT license.

Related projects