This is right after the session configuration phase. Here we attempt to explain our chosen topic to the agent
Requirements
UI
Transcribe speech using Azure Speech Service (currently used) React Speech Recognition or use Whisper for better transcripts
Somehow send transcripts periodically to agent, make it as natural as possible, could be debounced based (wait 10s until user stops talking), an extra idea I had was using a gpt3.5 instance as a transcripts manager to autocorrect bad transcripts and decide whether a user is still speaking or not to send to a backend function
(Nicolo) Convoscope debounces on the frontend and backend, which I think is confusing and they do it only for faster processing for entities
Backend
Backend function could be api/process_speech or similar and it would return a response as a student
Currently, it responds with the student's response, an expert's opinion criticizing the user's explanation and an emotion (I thought this could show the learner's expression in real time)
Backend function gets thread_id through session_id and interacts with OpenAI's assistants api
Save transcripts and response to the database, we can display line by line critique in the post session screen
Context
This is right after the session configuration phase. Here we attempt to explain our chosen topic to the agent
Requirements
UI
Backend
api/process_speech
or similar and it would return a response as a studentthread_id
throughsession_id
and interacts withOpenAI's assistants api
Child Issues