alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.36k stars 1.04k forks source link

Add arguments `time_off` and `duration` to transcriber #1533

Open me-kell opened 3 months ago

me-kell commented 3 months ago

Currently the transcriber processes the whole input file. From the beginning to the end.

It would be very useful to be able to pass a start time offset and/or a duration to the transcriber.

Here is a proposal how to do it:

Add (ffmpeg's) arguments time_off and duration in python/vosk/transcriber/cli.py after line 46.

parser.add_argument("--time_off", "-ss", default=None, type=int, help="start time offset")
parser.add_argument("--duration", "-d", default=None, type=int, help="duration")

Pass the arguments time_off and duration to ffmpeg in function resample_ffmpeg in python/vosk/transcriber/transcriber.py (line 115):

        cmd = shlex.split("ffmpeg -nostdin -loglevel quiet "
                "-i \'{}\' -ar {} -ac 1 {} {} -f s16le -".format(
                    str(infile), 
                    SAMPLE_RATE, 
                    f'-ss {self.args.time_off}' if self.args.time_off is not None else '', # add this
                    f'-t {self.args.duration}' if self.args.duration is not None else ''   # and this
                    ))

The function resample_ffmpeg_async could be adapted similarly.

nshmyrev commented 3 months ago

Hi, thank you for the proposal! Looks nice but what is the usecase please? I can't imagine the user needs to start from certain offset instead of just processing the whole file.

me-kell commented 3 months ago

Some use cases: