audio-language-trainer

Caveat

A personal project - not intended for plug-and-play release. You are welcome to use what is here, but it will not work out of the box (see setup section later)

Motivation

There is often a gap after completing a basic language course (like Section 1 on DuoLingo, or perhaps you've done a Foundation Michel Thomas method) - the advice is typically to start reading magazines and watching TV, but this is far too hard. I was wanting something that exposed me to some longer dialogue and also reinforced what I had learnt (or was still learning). A sort of 'stepping stone' towards watching a TV program perhaps, or going abroad and being more comfortable in the middle of a group where people are conversing - they won't pause after 3 or 4 words, they just keep talking amongst each other!

This is designed to fill that gap.

Introduction

Given a vocab list and target language, it will produce practice material for you in the format of listen, speak, check

It does this by generating several episodes of an engaging story line (tailored to your vocabulary), told through the lens of a two-person dialogue (Alex and Sam). This will give you long-form listening practice (this is lacking from DuoLingo). This can be a bit overwhelming at first, so it also provides mini-phrases (using the same vocabulary) starting simple and going more complex. These can be exported as AnkiFlash cards for ongoing practice. After listening to, and speaking the practice phrases you are then equipped to understand the longer dialogue.

The practice phrases aren't just snippets of the dialogue, they are new phrases using the same vocabulary.

Inputs

You need to update or provide your own (english) vocab list (a JSON file) and grammar concepts file (a JSON file) - e.g. what tenses you want to practise. You then update the config file with your target language, give the story a name and a one line description (e.g. swedish_1 for the name), the description will nudge the LLM into what type of story to do (e.g. a camping trip in the mountains)

Outputs

A story is generated by an LLM and comes in several parts:

Exposition
Rising action
Climax
Falling action
Resolution

It randomly samples from your vocab list, weighted by vocab usage (unfamiliar words are prioritised), this means you can get a wide variety of stories, including some rather random situations where someone might take a clock on a camping trip for example (it is trying its best to use the sampled vocab together with the story theme!). It will also likely introduce some new words.

All data (the story outline, the dialogue etc) are saved in a JSON document for inspection.

M4A audio files

We generate an .m4a file for each of those (this format allows for synced lyrics), so if you play the m4a files through e.g. Oto Music player, you can see the text of the audio (like closed captions).

The sequence we follow for the stories in the audio file is:

The whole dialogue in the target language (for motivation - it is unlikely you will understand this initially)
Practice phrases (english audio - pause (for you to speak) - target normal speed audio- target slow speed audio)
Double-speed whole dialogue in target (x 12 repeats) (based on research that this improves language parsing - ability to separate out words). The idea here is just relax and listen, after several repeats you will find your brain separates out the words
Normal-speed whole dialogue in target - you should feel a sense of achievement here as this will be far easier to understand then when you started out at step 1

An audio file is generated for each story part as well as a single large file with everything.

Anki Web flash card deck

We generate an *.apkg file of the practice phrases (this is a flash-card format for Anki Web). You can import this file (it contains all the audio), and there are 3 'flavours' of cards; speaking, listening and reading, these just vary what you get exposed to on each side of the card. e.g. for speaking practice, you get the english text as a prompt, for reading, you get no audio, just the target text etc.

PDF File

A booklet containing the english and target text for each part of the story and practice phrase

How it 'learns'

No AI here, but we do run a small spacy langauge model over any dialogue that is generated which updates your personalised vocab list. Subsequent stories will then priortise sampling words you haven't yet heard. So you should find that as you continue to use it you work through your vocabularly backlog.

Setup

Use the notebook which guides you through the process, but there is substantial setup required in terms of Google Cloud APIs,and FFMPEG (for audio generation) to configure if you want to run the code yourself. If you want an example lesson, happy to oblige!

Google Cloud:

You will need an account and a project with the ID and Number, as well as know your regions for the LLM.

Google APIs used:

Text to Speech
Translate
Vertex AI (and authorise Anthropic Sonnet 3.5 in the model garden)

Client:

install FFMPEG for audio generation and fonts for the PDF system to work

Acknowledgements

https://www.saysomethingin.com/en/home/ - heavily inspired by the approach of this company, DuoLingo and the Michel Thomas method.

Andy7475 / audio-language-trainer

readme