VikeLabs / lecshare

A React application with typescript implementation: Making lectures more accessible for students with auditory, visual, or learning impairments; and Improving learning outcomes for students in lecture-based courses.
https://vikelabs.github.io/lecshare/
GNU General Public License v3.0
5 stars 3 forks source link

Change transcription ingestion format to AWS Transcribe version #79

Closed aomi closed 4 years ago

malcolmseyd commented 4 years ago

Done, see Vikelabs/lecshare-api@7b5a377 and Vikelabs/lecshare-api@824ec8b

Amazon's JSON file had a lot of unneeded data so I decided to make the output format different. I accomplished this by unmarshalling to an intermediate struct and then copying the values that I wanted to another struct. The new graphql output looks like this:

{
  "transcripts": [
    "we generate a new schedule during the following. First we go into the schedule view, which is the current view, and we generate a new schedule done on the button below. Here we select our time. Period or time? Interval. We hit generating you schedule. But first, we must select and resolve these issues. Conflicts we have was down below here We slept, Lisa, to fill in the position where the system is unable to automatically find a worker. Once the conflicts have been resolved, no conflicts found we can hit, generate schedule. And then yes, we want to generate the schedule. And then after this, we're done."
  ],
  "words": [
    {
      "word": "we",
      "type": "pronunciation",
      "starttime": "1.46",
      "endtime": "1.59"
    },
    {
      "word": "generate",
      "type": "pronunciation",
      "starttime": "1.59",
      "endtime": "1.9"
    },
    {
      "word": "a",
      "type": "pronunciation",
      "starttime": "1.91",
      "endtime": "1.98"
    }
  ]
}

Speaking of which, AWS Transcribe counts punctuation as words but have no start/end time so they can be null. Frontend people should look out for this:

{
  "word": ",",
  "type": "punctuation",
  "starttime": null,
  "endtime": null
}