JuliaNeuralGraphics / Whisper.jl

MIT License
19 stars 1 forks source link

English #6

Open FredrikKarlssonSpeech opened 3 weeks ago

FredrikKarlssonSpeech commented 3 weeks ago

This may be related to other languages than English being partially implemented (? https://github.com/JuliaNeuralGraphics/Whisper.jl/issues/5) , but in my initial test I see that the sentences identified in the SRT file are very long. See below:

1
00:00:00,019 --> 00:00:21,020
 What gives me is the sense of freedom. To be able to take my horse and ride it in the forest and have fun with it. To get somewhere. But if you are out in the forest, you don't get that far. But with the horse you can get quite far in a short time.

2
00:00:21,020 --> 00:00:30,000

3
00:00:30,020 --> 00:00:50,020
 The water continues to flow out towards the sea, so from here you come straight out towards the sea. It takes two hours to ride down. If you walk it would take twice as long. So it's just this sense of freedom to be able to just ride away and gallop.

4
00:00:51,020 --> 00:01:00,000

5
00:01:00,000 --> 00:01:30,000
 What I think is the most fun is that I, who weighs 50 kg, can get a horse that weighs 650 kg and do exactly what I want. Exact movements. I think that's cool.

Not sure that I would be able to use the output with such long sections of speech being clearly identified as sentences but then not used as such in the SRT for anything useful.

pxl-th commented 3 weeks ago

I haven't worked a lot on the SRT output so it is mostly dumps the text with approximate timestamps.