This repository is archived and will not receive future updates or support.
As of February, 2022 this software is still working assuming you install the correct dependencies.
All code in this repository is made available under the MIT license, so please feel free to fork / copy / modify it for your own use according to the license terms. Thank you to the contributors who have helped improve glossika-to-anki over the years.
The original README follows below.
Generate Anki decks from Glossika PDFs and audio files
glossika-to-anki
is a set of Python 3 scripts to generate Anki flashcards using the PDFs and audio from the Glossika language program.
glossika-to-anki
provides three main utilities:
glossika_extract_pdf.py
- Generate a TSV file of English and target language phrases from Glossika PDFsglossika_split_audio.py
- Split GMS-C audio files into individual mp3s for each phrasegenerate_anki.py
- Create an Anki deck by combining each phase and its corresponding audio into a separate Anki note / cardbrew install python
or apt install python
.pdftotext - Converts the Glossika PDFs to text so that the phrases can be extracted with regex.
On Windows download Xpdf tools and copy pdftotext.exe to a folder on the path (i.e. the Python folder). If you installed python with the Windows installer, the default path should be C:\Program Files
or C:\Users\your_name\AppData\Local
. Alternatively, you might also be able to run which python
or where python
from cmd prompt to figure out where the python executable is located.
On MacOS brew cask install pdftotext
; on Linux apt install poppler-utils
brew install mp3splt
; on Linux apt install mp3splt
pip install genanki
or pip3 install genanki
ENZS-F1-GMS-C-0001.mp3
. Note that the ENZS prefix
is for Mandarin and varies by language.Clone or download the repository.
git clone git@github.com:emesterhazy/glossika-to-anki.git
cd glossika-to-anki/glossika-to-anki
Run each script and follow the prompts to copy the Glossika files into the source folder that is created.
python glossika_extract_pdf.py
python glossika_split_audio.py
python generate_anki.py
Import your new Anki deck!
Only supports the v2 Glossika PDFs, not the older non-searchable PDFs. PDFs with copy protection must have it removed before sentences can be extracted.
No support for extracting IPA
Pull requests are welcome. If you would like to add support for the v1 Glossika PDFs or make changes that require new dependencies please open an issue first to discuss.