AMT-Transcripts

The Transcription project for the Art + Music + Technology podcast

Want to read the transcriptions? Give this a try:

http://darwingrosse.com/AMT/transcriptions.html)

AMT-Transcripts Transcription App

Transcriptions for the AMT Podcast are completed using the https://rev.ai service, which produces a JSON file with timestamp, punctuation, spacing and timing for every word in the transcribed document. But like all auto-transcribing tools, it misses a lot of stuff. For example, MIDI is often transcribed as "mini", or "Mitty" or even "Mindy". So it still has to be cleaned up manually.

The tr-parse application runs through the JSON document, ID's poorly transcribed words, labels the speaker, formats things into manageable HTML paragraphs, and installs some boilerplate around the HTML. It also, optionally, can create a document that provides direct access to the audio file for verification of the words (which is proving to be a great timesaver, see it in action).

Huge thanks to Bernhard Wagner, who took a minimal Node.js program and is turning it into a powerhouse!

Getting Started

These instructions will get you a copy of the project up on your local machine for cleaning up automatic transcriptions.

Prerequisites

git clone https://github.com/darwingrosse/AMT-Transcripts.git
cd AMT-Transcripts/App
npm install

Invoking the program

cd AMT-Transcripts/App
./tr-parse.js --help
generate plain html from json, filtering out cruft

Usage:
  tr-parse.js plain <json> -s <speaker>... -r <release-date>
  tr-parse.js audio <json> -s <speaker>... -r <release-date> -o <audio-offset> [ -a <audio-file> ]
  tr-parse.js --help | -h
  tr-parse.js --version | -v

Commands:
  plain                             Generates html only.
  audio                             Generates html with audio embedded.

Options:
  --speaker -s <speaker>...         speaker(s) 
                                    NOTE: Each speaker needs to be preceded by '-s' (or '--speaker')
  --release-date -r <release-date>  e.g. 'November 13, 2019'
  --help -h                         Display help
  --version -v                      Show version number
  --audio-offset -o <audio-offset>  only valid and required for 'audio' command. Offset when speech
                                    starts in seconds (float, e.g. "-o 6.1").
  --audio-file -a <audio-file>      only valid and optional for 'audio' command.
                                    If not provided, the respective audio file
                                    is searched for in ../AUDIO according to
                                    episode number retrieved from json file name.

To generate an HTML file with the ability to play the audio podcast:

./tr-parse.js audio transcript-0005.json -s Darwin -s 'Barry Moon' -r 'November 10, 2013' -a path_to_podcast_audio/Podcast_005_BMoon.mp3 -o 6.1

audio: This invokes the audio subcommand
transcript-0005.json: The name of the json source file.
-s: (repeated) speakers in order of their appearance (usually two).
-r: release date of the podcast
-a: path to the audio file of the podcast
-o: offset where the spoken word part begins in seconds

Note: The audio play/stop interface is minimal: Clicking on a word anywhere starts playing the podcast from that point on. But clicking also toggles between playing and pausing. You'll get the hang of it. You can also toggle play/pause by pressing the space bar. The audio files of the podcast are available here. The background of the rendered html file is yellowish when it is connected to the audio podcast. The text with lighter background is clickable and the podcast will be played from there. Also, the cursor changes to a hand when hovering above a word from where the podcast can be started.

Note: the audio subcommand and its options -a and -o are to be used only while fixing the transcriptions. When done with fixing, the HTML should be generated using the plain command devoid of the two audio options -a, -o, e.g.:

./tr-parse.js plain transcript-0005.json -s Darwin -s 'Barry Moon' -r 'November 10, 2013'

The generated html file will have the same stem as the provided json file but with the .html extension and it will be stored in the HTML directory. E.g. HTML/transcript-0005.html If the audio subcommand is used, the generated html file will have the same name with _audio appended before the extension, e.g. HTML/transcript-0005_audio.html The generated HTML/transcript-0005_audio.html has a default style to facilitate distinguishing it from the plain htmlgenerated when using the plain subcommand. The audio default style can be overridden by providing a css file named ../HTML/j2h.css. There's an example file ../HTML/j2h_example.css.

Note: If no audio file is passed via the -a option, the audio subcommand will look for an audio file in the folder ../AUDIO based on the episode number, e.g.

./tr-parse.js audio transcript-0005.json -s Darwin -s 'Barry Moon' -r 'November 10, 2013' -o 6.1

will look for an mp3 file in ../AUDIO/ that has 0005 or 005 in its file name. E.g. for episode 0005 Podcast_005_BMoon.mp3 will be found if present in the ../AUDIO/ directory. You need to populate ../AUDIO/ yourself from the art + music + technology podcast.

Repo Contents

/App - Contains the conversation application (written in Node.js) used for a transcription aid.
/HTML - Contains the transcribed and completed HTML for the podcast episodes.
/JSON - Contains the raw JSON files created by the https://rev.ai conversion system
/AUDIO - Container for downloaded episodes from the art + music + technology podcast. You need to download them yourself.

Run Tests

npm test

Coding Style Tests

TBD

Built With

docopt - For command line options
glob - For command line options
jest - For testing
shellwords - For testing

Contributing

TBD

Versioning

TBD

Authors

Darwin Grosse - initiator and main author
Bernhard Wagner - contributions to code and transcriptions

License

Acknowledgments

Billie Thompson for the README template.

bwagner / AMT-Transcripts

readme