Backend Processing - Githubissues

dahifi commented 8 months ago

Implement an endpoint or service that can take the raw output of the ASR pipeline and transform it into the CSV format required by the VA, with likelihood scores for speaker identification.

The development and testing of this transformation service could take several days, considering the need to accurately reflect the diarization data.

dahifi commented 8 months ago

This will be implemented as a Google Cloud function.

dahifi commented 8 months ago

To fulfill the client's requirements, we need to process the ASR JSON file to extract the necessary information and format it as specified. Let's break down the tasks to achieve this:

Word-for-word Transcription

Parse the JSON file to extract the "text" field from each segment.
Concatenate these texts to form the full transcription.
Format the transcription as per VA standard compliance and encapsulate it in a JSON or HL7 file.

Speaker Likelihood Score Vector

Iterate through each word in the "words" array of every segment.
For each word, generate a vector of length 10, where each element corresponds to a speaker. The element's value is the likelihood score if the speaker matches; otherwise, it's 0.
Create a CSV file with the specified columns: index, word, and the likelihood scores for up to 10 speakers.

Clinical Encounter Summary

Generate a concise summary based on the transcription. This might involve identifying key points, decisions, or actions discussed during the encounter.
Format the summary in plain text and encapsulate it in a JSON or HL7 file as required.

To start, we can create a Python script to process the JSON file and generate the required outputs. The script will include functions for parsing the JSON, generating the word-for-word transcription and speaker likelihood score vector, and summarizing the encounter.

Here's a high-level overview of the tasks we need to perform, along with a due date for each:

- [ ] Parse the ASR JSON file to extract necessary details for transcription and speaker information. 
- [ ] Generate a word-for-word transcription of the encounter and format it in a VA standard compliant JSON or HL7 file. 
- [ ] Create a CSV file with the speaker likelihood score vector for each word in the transcription. 
- [ ] Summarize the encounter based on the transcription and format it in a plain text JSON or HL7 file.

dahifi commented 8 months ago

Opted to do the work on the client side.

VACOTechSprint / ambient-transcription

Backend Processing #8

Word-for-word Transcription

Speaker Likelihood Score Vector

Clinical Encounter Summary