Open famda opened 1 month ago
Thanks, it's possible yes, there's an example in one of the branches if you want to try it, but I haven't added it to the main branch because when it comes to JSON, everyone has their own scheme and a universal scheme won't cut it, but happy to hear your suggestions
I understand. I think is just a matter of having structure on the response. Something that can be deserialized. I was also testing this which is kinda wrapper api around whisper. That API gives you the possibility of getting the type of format you want to receive (text, json, ...).
with the possibility of passing an argument like --output_format [json, srt, text, or whatever]
My idea was to have something like this (just a suggestion if it makes sense):
{
"text": "Hi, my name is Test.",
"speaker": "Speaker 0",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 5.4,
"text": "Hi, my name is Test.",
"tokens": [
double array
],
"temperature": 0.0,
"avg_logprob": -0.19734466075897217,
"compression_ratio": 1.7903780068728523,
"no_speech_prob": 0.1006949171423912,
"words": [
{
"word": " Hi,",
"start": 0.0,
"end": 0.64,
"probability": 0.7109836935997009
},
{
"word": " my",
"start": 0.88,
"end": 1.08,
"probability": 0.9681467413902283
},
{
"word": " name",
"start": 1.08,
"end": 1.22,
"probability": 0.9989060163497925
},
{
"word": " is",
"start": 1.22,
"end": 1.38,
"probability": 0.9960727691650391
},
{
"word": " Test.",
"start": 1.38,
"end": 1.62,
"probability": 0.8055099844932556
}
]
}
],
"language": "en"
}
What do you think of this?
Sounds reasonable, I'll work on it when I have the time, or maype open a PR if possible 😁
Hey! Awesome work on this!
Is it possible to transcript/diarize and get a json output as a result file? That would be a nice feature to have.