HoukasaurusRex / rules-as-written-podcast

Static site for the Rules as Written podcast
0 stars 0 forks source link

[FEAT]: Automate Transcription Pipeline #18

Open HoukasaurusRex opened 3 years ago

HoukasaurusRex commented 3 years ago

Infrastructure

Sample Architecture

JavaScript SDK

Create CDK repo for infrastructure

DynamoDB

Store podcast episode lists and transcription meta info.

Columns

Lambdas

AWS Transcribe

Pricing

Bulk of cost comes from AWS Transcribe, with cost potential in DynamoDB depending on storage, reads, and writes per month.

Estimate

HoukasaurusRex commented 3 years ago

Example API request

{
    "TranscriptionJobName": "raw-ep1-what-is-dd-test",
    "LanguageCode": "en-US",
    "MediaSampleRateHertz": 44100,
    "MediaFormat": "mp3",
    "Media": {
        "MediaFileUri": "s3://raw-transcription-test/raw-ep1-what-is-dd.mp3"
    }
}

Example API response

{
    "TranscriptionJob": {
        "TranscriptionJobName": "raw-ep1-what-is-dd-test",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "en-US",
        "MediaSampleRateHertz": 44100,
        "MediaFormat": "mp3",
        "Media": {
            "MediaFileUri": "s3://raw-transcription-test/raw-ep1-what-is-dd.mp3"
        },
        "Transcript": {},
        "StartTime": "2021-07-10T10:57:51.298Z",
        "CreationTime": "2021-07-10T10:57:51.222Z",
        "Settings": {
            "ChannelIdentification": false,
            "ShowAlternatives": false
        }
    }
}
HoukasaurusRex commented 3 years ago

Convert aws transcribe json to srt package / code