Purfview / whisper-standalone-win

Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
1.2k stars 62 forks source link

Invalid system calls when run under Take Command Console #31

Closed ClaireCJS closed 1 year ago

ClaireCJS commented 1 year ago

I use a lot of different command-lines, and yet, I don't think I've seen this happen before.

Whisper-faster.exe ends up sending commands to the command line under TCC command line, but not under CMD.EXE.

But I abandoned CMD.EXE back when it was command.com in 1988. TCC has been in constant development. So it's not some janky command line, even though most people haven't heard of it. It's really solid. So I'm wondering how this is happening.

There's some very niche incompatibility here because this is not something I've seen in decades of use.

Any idea if we can address it?

whisper-faster.exe --language en --verbose True --device cuda --model large --output_form
at all "14_The Water Is Wide.mp3"

Standalone Faster-Whisper r134 running on: CUDA

Number of visible GPU devices: 1

Supported compute types by GPU: {'float32', 'int8', 'int8_float16', 'float16'}

[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info] CPU: AuthenticAMD (SSE4.1=true, AVX=true, AVX2=true, AVX512=false)
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - Selected ISA: AVX2
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - Use Intel MKL: false
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - SGEMM backend: DNNL (packed: false)
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - GEMM_S16 backend: none (packed: false)
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - GEMM_S8 backend: DNNL (packed: false, u8s8 preferred: true)
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info] GPU #0: NVIDIA GeForce RTX 3060 (CC=8.6)
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - Allow INT8: true
[2023-07-06 06:48:38.064] [ctranslate2] [thread 13552] [info]  - Allow FP16: true (with Tensor Cores: true)
[2023-07-06 06:48:52.199] [ctranslate2] [thread 13552] [info] Using CUDA allocator: cuda_malloc_async
[2023-07-06 06:48:53.037] [ctranslate2] [thread 13552] [info] Loaded model C:\UTIL2\_models\faster-whisper-large-v2 on device cuda:0
[2023-07-06 06:48:53.037] [ctranslate2] [thread 13552] [info]  - Binary version: 6
[2023-07-06 06:48:53.037] [ctranslate2] [thread 13552] [info]  - Model specification revision: 3
[2023-07-06 06:48:53.037] [ctranslate2] [thread 13552] [info]  - Selected compute type: int8

Model loaded in: 15.06 seconds
Estimating duration from bitrate, this may be inaccurate

Processing audio with duration 03:16.650

VAD filter removed 00:41.630 of audio
VAD filter kept the following audio segments: [00:00.000 -> 01:36.324], [02:03.836 -> 03:02.532]

Audio processing finished in: 2.18 seconds

Processing segment at 00:00.000
[2023-07-06 06:48:56.731] [ctranslate2] [thread 95492] [info] Loaded cuBLAS library version 11.11.3
[00:04.180 --> 00:20.460]  Oh, the water is wide and I can't get o'er Neither have I the wings to fly
Processing segment at 00:20.460
[00:21.420 --> 00:35.640]  Give me a boat that we'll carry to And we both shall row, my love and I
[00:36.080 --> 00:49.460]  There is a ship and she sails the sea She's loaded deep, as deep can be
Processing segment at 00:50.460
TCC: Unknown command "20"
TCC: Unknown command "00:02"
TCC: (Sys) The system cannot find the file specified.
 ""
[00:50.460 --> 01:04.520]  But not as deep as the love I'm in I know not how to sink or swim
[01:04.930 --> 01:18.080]  Oh, the water is wide and I can't get o'er Neither have I the wings to fly
Processing segment at 01:18.080
TCC: Unknown command "49"
TCC: Unknown command "00:03"
TCC: (Sys) The system cannot find the file specified.
 ""
[01:19.080 --> 01:34.060]  Give me a boat that we'll carry to And we both shall row, my love and I
Processing segment at 01:34.060
TCC: Unknown command "78"
TCC: Unknown command "00:04"
TCC: (Sys) The system cannot find the file specified.
 ""
[01:35.060 --> 02:16.570]  Love is handsome and love is fine Love is a jewel when first it's new
Processing segment at 01:49.060
TCC: Unknown command "94"
TCC: Unknown command "00:04"
TCC: (Sys) The system cannot find the file specified.
 ""
[02:17.270 --> 02:30.730]  But love grows old and in time grows cold And fades away like the summer dew
Processing segment at 02:03.220
TCC: Unknown command "137"
TCC: Unknown command "00:05"
TCC: (Sys) The system cannot find the file specified.
 ""
[02:31.730 --> 02:45.450]  Oh, the water is wide and I can't get o'er Neither have I the wings to fly
Processing segment at 02:17.940
TCC: Unknown command "151"
TCC: Unknown command "00:05"
TCC: (Sys) The system cannot find the file specified.
 ""
[02:45.450 --> 03:00.650]  Give me a boat that we'll carry to And we both shall row, my love and I

Transcription speed: 27.12 audio seconds/s

Operation finished in: 24 seconds

TCC: Unknown command "165"
TCC: Unknown command "00:06"
TCC: (Sys) The system cannot find the file specified.
 ""
TCC: Unknown command "165"
TCC: Unknown command "00:07"
TCC: (Sys) The system cannot find the file specified.
 ""

Visually, here's what it looks like under CMD.EXE -- it works just fine:

image

Yet under TCC.EXE, I get this: image

It's sending the timestamps straight to the command line?!?!

Is this something I could possibly be helped with?

Purfview commented 1 year ago

I remember 4DOS. :) Is TCC better than PowerShell?

Those "TCC: Unknown command" are TITLE commands meant to show a progress bar in a title bar. For example try TITLE this is new title command.

Purfview commented 1 year ago

Is this something I could possibly be helped with?

Yes, I'll check what is going on there with TCC.

ClaireCJS commented 1 year ago

Interesting. I've set window titles to keep myself informed in other situations too, it's a fun idea.

and i kind of despise PowerShell :)

in TCC, it's an internal command, same as CMD, so yea, strange stuff going on for sure :)

image

Purfview commented 1 year ago

Can't reproduce it, works OK for me, no errors:

image

Except that % character doesn't show in the progress bar, even escape doesn't help. I used TCC LE as TCC v30 doesn't work with Windows 7.

Check if title 40% ^| 00:00^<^<11:11 ^| a/s command works for you.

ClaireCJS commented 1 year ago

tangent: Just wait until you get to windows 10. You can install windows terminal, which is suuuuuch a great container for TCC. Suddenly, we can see unicode/emojis in filenames, and there's just much better console options. Incredible ansi support, even support for double-height VT100 ansi text which I'm using for certain messages. It's nice having italics and underlines and blink and strikethrough and 2-line-tall text. Plus it supports multiple tabs and panes within each tab which are great.

Anyway, that command actually doesn't work for me.

image

And I can tell you some reasons why:

1) % is a special character for environment variables. To represent % in TCC you have to use two %s or enclose the title in quotes. So this actually sets the title to "40" for me instead of "40%", but also it still gives the unknown command 00:00

2) Why? Because "^" is the command separator character for me. That may be a bad choice on my part, but it's one i've used since the 1990s.

The solution seems to be to put quotes around it all, which makes "^" safe but still doesn't make "%" work unless it is doubled to "%%":

image

Of course, I worked around this by using CMD.EXE /C to run your program with CMD.EXE ... So there is a workaround. I just hadn't had to use something like that in years.

[The stuff I make is often unrunnable for folks who don't run TCC because I'm so deeply embedded in it. It's kind of a bummer.]

Purfview commented 1 year ago

Yeah, it's not wise to mess with the default escape character. Anyway, check r134+ update, but there is no universal solution for "%" character. [Btw. double %s didn't make it work in TCC LE]

ClaireCJS commented 1 year ago

Escaping isn't needed if quotes are around an argument.

It's why this situation doesn't come up for me.

That being said, I'd prefer every command line to use \ for escaping. That's how it is in bash and pretty much every unix-originating CLI and as most programming languages.

Purfview commented 1 year ago

One more thing, I've noticed that int8 is used in your screenshots. Did you set it? I think by default float16 should be used, and it should be faster than int8.

ClaireCJS commented 1 year ago

One more thing, I've noticed that int8 is used in your screenshots. Did you set it? I think by default float16 should be used, and it should be faster than int8.

that was just the beginning of the list of available operations. I actually have CUDA working (and it was so much easier to get working right this time around than last time--not sure why, I think I like Torch 2.x better than Torch 1.x)

From what i've seen (and from things running in 1-2 minutes in GPU mode when previously taking 15 minutes in CPU mode) everything seems to be using fp16 now. :) Thanks!

Purfview commented 1 year ago

Not the list, but there:

[2023-07-06 06:48:53.037] [ctranslate2] [thread 13552] [info]  - Selected compute type: int8

Torch is not in use by Faster-Whisper, it's used only by OpenAI's Whisper.

ClaireCJS commented 1 year ago

Hmm Had to double-check, but i think it's good on the cuda

But definitely not getting through the song's lyrics

image

That was with the default/medium model

with large-v2 it's basically the same:

image

It's not every song that's this bad, but whisper-faster.exe is having such incompleteness compared to whisper.exe with the same model -- at least on my end, that's how it seems.

But whisper-faster has that granularity in the timestamps that whisper.exe lacks. Whisper.exe is not good for making subtitles/karaokes(LRC files) because of how bad the timestamping is. Whisper-faster.exe is excellent for those purposes... but just keeps giving me incomplete transcriptions.

In the end, neither one is giving me the file I want. I know it's possible and I'm sure we'll get there soon. I'm excited to help test this stuff out.

ClaireCJS commented 1 year ago

and not sure if it helps, but here's another example of whisper-faster vs whisper. yea, it was 21 seconds vs 8 minutes, but whisper.exe didn't miss the last verse [pardon the profanity in the lyrics lol]

image

Purfview commented 1 year ago

Thanks for screens, I see that I made an error. I didn't meant to select int8 on cuda, this should be fixed in "r134++".

Btw, you are still using r134 instead of r134+ where your issue is fixed.

Purfview commented 1 year ago

You get different results because Standalone Faster-Whisper is using different defaults than OpenAI's Whisper.

About settings: You don't need to use --verbose True. Set language for faster transcription: --language=en. I think you should disable VAD for music - --vad_filter=False. If you want karaoke subtitles then set --highlight_words=True. Check if --beam_size=5 makes a difference.

ClaireCJS commented 1 year ago

Thanks for screens, I see that I made an error. I didn't meant to select int8 on cuda, this should be fixed in "r134++".

Btw, you are still using r134 instead of r134+ where your issue is fixed.

Thanks! I have no idea how to upgrade actually because i'm unclear what the name is to upgrade it with Pip (i guess it would not make sense for it to be on pip actually) , and when I go to the repository, i only see a readme which doesn't say how to get the files, and i don't see any browseable files like I do with other github repos?

so perhaps... the readme itself might need to be updated with a download link that is more obvious or something?

[I just woke up so my brain isn't fully on yet and i can't remember how I installed it in the first place. Do I have dementia fears? I sure do. I'm getting up there in age and hope I don't get hit with it haha 😅 ]

ClaireCJS commented 1 year ago

You get different results because Standalone Faster-Whisper is using different defaults than OpenAI's Whisper.

There doesn't seem to be a way to know the defaults without looking at the code, which is why I wanted verbose on, to see stuff easier.

About settings: You don't need to use --verbose True.

But i would like to. To the extent that I believe i submitted a feature request for it (but that might have been for openlrc, i don't remember).

Set language for faster transcription: --language=en.

I've been meaning to. :)

I think you should disable VAD for music - --vad_filter=False.

Hmmm, why do you think that? I'm curious. I would have thought that would mean there is no silence detection, so the resulting file would have no silence in it. I.E. if some one sings a line before a solo, that line will be stuck onscreen until someone else says something, because there will be no word-specific-end-timestamp.

Am I wrong? That was my reason for trying this fork out, to be honest. To make it so words aren't stuck on the screen 100% of a song.

If you want karaoke subtitles then set --highlight_words=True.

I think that adds HTML hilighting to the output, which i will be converting to LRC format (an LRCwriter function is really easy to create if the timestamps are good, though! I modified openai's version to create LRC, it's just that the timestamps bum up against each other making it ugly)

Check if --beam_size=5 makes a difference.

Hmm, okay. The -h option describes it as "number of beams in beam search, only applicable when temperature is zero", which unfortunately isn't informative to me because i'm ignorant about some of the inner-workings. I'd love it if the help explained why one would modify it and what the results were and what some reasonable values are.

I'll try it out! Thanks!

Purfview commented 1 year ago

Thanks! I have no idea how to upgrade actually because i'm unclear what the name is to upgrade it with Pip...

By "update" I meant - download the new version. All downloads are in Releases: https://github.com/Purfview/whisper-standalone-win/releases

ClaireCJS commented 1 year ago

Thanks! I have no idea how to upgrade actually because i'm unclear what the name is to upgrade it with Pip...

By "update" I meant - download the new version. All downloads are in Releases: https://github.com/Purfview/whisper-standalone-win/releases

Thanks! I blame morning brain on this one, because I was able to find it the first time.😅

By the way, vad_filter false alone seems to improve the results vastly -- and that's without me having upgraded yet.

VAAAAAAAAAASTLY.

And my fears earlier about this making the timestamps bad were unfounded.

I will be stripping out the music characters in my situation, an option to not include the lines that are a single music note might be helpfuf, but easy enough for me to implement on my end! :)

Amazing!!! I think this just reached the point where I can proceed with my personal goals of building something around this! awesome!!! thank you so much!!!

image

MORE SIDE STORY: I use Minilyrics to display lyrics and it makes the LRC file "next" to the mp3 file just like whisper does.

But I also use EvilLyrics and it uses a repo at c:\lyrics\

so I'm going to need to write a piece of code to examine my mp3, look and see if it has an lrc in the same folder with it, if it doesn't, extract the artist and title tag out and check for a file's presence in c:\lyrics, and oly then, create an LRC automatically (I don't want AI LRC files superceding human-created ones!)

The other case I don't know what to do with yet is the case of having lyrics in TXT format, and wanting the AI to make the LRC. What I'd like to do is send the TXT lyrics in as context to whisper, so that whisper makes fewer mistakes. I don't know if anyone is doing this yet or not. pysync seemed to make an attempt but that project appears to be abandoned in March.

Purfview commented 1 year ago

There doesn't seem to be a way to know the defaults without looking at the code, which is why I wanted verbose on, to see stuff easier.

whisper-faster.exe --help

But i would like to. To the extent that I believe i submitted a feature request for it...

Definitely not here. There "True" is kinda same as "False" here. "True" here only adds bunch of additional nerd yadda yadda.

I think that adds HTML hilighting to the output...

No, it underscores words in usual srt files.

EDIT: I don't know what LRC files are but if you have python function to output them then I can incorporate it here.

ClaireCJS commented 1 year ago

There doesn't seem to be a way to know the defaults without looking at the code, which is why I wanted verbose on, to see stuff easier.

whisper-faster.exe --help

oh, oops, lol, my bad. 😅

But i would like to. To the extent that I believe i submitted a feature request for it...

Definitely not here. There "True" is kinda same as "False" here. "True" here only adds bunch of additional nerd yadda yadda.

I like the yadda yadda 😎

But "True is kinda same as false here" is interesting!

I think that adds HTML hilighting to the output...

No, it underscores words in usual srt files.

Oh how interesting. I use subtitles with VLC a lot but I find them distracting. I think highlighting would be super distracting, but it's really really cool that feature exists. I could have used it when watching Trainspotting...

EDIT: I don't know what LRC files are but if you have python function to output them then I can incorporate it here.

They are a fairly wide (most used, maybe) standard for displaying lyrics in a timed fashion so they are displayed as sung. Here's an example:

https://github.com/Purfview/whisper-standalone-win/assets/789591/07084b14-a634-4ea3-ab61-0b97e384bb13

And yes, here is how i modified openai to output lrc. Mine is a very very simple implementation but if the SRT is very well-formed and done well, it should make a comparable LRC.

Requires modifying whisper\utils.py to add this class:

class WriteLRC(ResultWriter):
    extension: str = 'lrc'

    def write_result(self, result: dict, file: TextIO):
        for i, segment in enumerate(result['segments'], start=1):
            # write lrc lines
            print(
                f'[{format_timestamp(segment['start'], always_include_hours=False, decimal_marker='.')}]'
                f'{segment['text'].strip().replace('-->', '').replace('🎵','')}\n'
                f'[{format_timestamp(segment['end'  ], always_include_hours=False, decimal_marker='.')}]\n',
                file=file,
                flush=True,
            )

As well as adding:

        'lrc': WriteLRC,

to the writers list found in get_writer near the end of the utils.py file:

    writers = {
        'txt': WriteTXT,
        'vtt': WriteVTT,
        'srt': WriteSRT,
        'tsv': WriteTSV,
        'json': WriteJSON,
        'lrc': WriteLRC,            # add this line
    }
ClaireCJS commented 1 year ago

p.s. the actual LRC file looks like this

While SRT files have a start and end timestamp

LRC files do not

but LRC files support blank lines, so the end timestamp with a blank line essentially "stops" the words from displaying

You can see a few of those "blank" lines that stop the words from displaying in the example below

[00:07.31]Fences hold me back from mine, baby
[00:11.86]Hold my hands and hold them tight
[00:15.04]I like my pit
[00:18.67]I want to stay
[00:22.10]That way I can't fall back in again
[00:29.98]Ah!
[00:38.40]
[00:39.58]I wanna scream
[00:43.50]Cut you up like you did me
[00:47.40]Is this all I've ever known?
[00:51.82]Is this all I've ever known?
[00:55.80]You took the sunshine from the days
[00:59.92]Now I live in shadows
[01:03.82]I'm just a dog with no bite
[01:07.84]This is all I've ever known
[01:10.36]Can't relax
[01:14.62]Head stuck in the ground
[01:19.26]Heading where we'll never be found
[01:25.58]I hope you feel
[01:29.60]I hope you feel the guilt I do
[01:33.64]You've got no shame
[01:37.60]You did the damage and I feel the pain
[01:43.10]
[01:50.81]I wanna scream
[01:55.01]Cut you up like you did me
[01:58.62]Is this all I've ever known?
[02:03.26]Is this all I've ever known?
[02:07.06]You took the sunshine from the days
[02:11.04]Now I live in shadows
[02:15.04]I'm just a dog with no bite
[02:18.84]This is all I've ever known
[02:21.58]
[03:10.49]Oh!
[03:12.50]I'm gonna forgive you so I can breathe
[03:16.84]Is this all I've ever known?
[03:19.66]Is this all?
[03:22.52]Is this all?
[03:24.24]You took the sunshine from the days
[03:28.36]Now I live in shadows
[03:31.84]I'm just a dog with no bite
[03:36.30]This is all I've ever known
[03:49.66]
ClaireCJS commented 1 year ago

So yes, if you could output these i would be SOOOO HAPPY.

I would be running this on about 30,000 songs.

And also if there were a way to keep the lines short.

BBC specifies .srt files shouldn't be wider than 42 characters for example. And for karaoke, shorter lines make more sense.

Purfview commented 1 year ago

And yes, here is how i modified openai to output lrc.

Added it to "r134+++". This "+" versioning doesn't look sane anymore. 😅

ClaireCJS commented 1 year ago

Oh thank you!!!

And hey, it's still saner than how C# is two C++'s put together 🤦🏼‍♀️

Purfview commented 1 year ago

I would be running this on about 30,000 songs.

You can contribute your TCC batch processing there, maybe you'll convert someone to TCC. :)

And also if there were a way to keep the lines short.

Maybe someone will make this work properly. Like Subtitle Edit properly splits.

ClaireCJS commented 1 year ago

Oh cool, I was asking for wildcard+filelist support in another post because I'm kind of in the same situation as theirs. Their code checks if the total length of the filelist is greater than 8000, which I guess is CMD's command line length limit, but TCC has no command line length limit other than your RAM. (I mean, we have RAM. Why not use it to load a long line of text!) I've definitely used a lot of stuff where I had command lines running >32K long each. So yea, I'm gonna let them know their solution would work a bit better under TCC

.....except that this invalid system calls issue with the title prevents it from running correctly.

So I don't know, perhaps can we have an option to disable the window titling? Even though I agree it's a really cool feature. It would allow me to run it without invoking CMD.EXE, and it would allow them to use TCC to get around that silly 8K command-line length limit.

Purfview commented 1 year ago

Their code

That's my code. ;) Can TCC run bat files? I can remove a check for limit then. Btw, a variable length has same limit. Probably same limits in PowerShell too.

.....except that this invalid system calls issue with the title prevents it from running correctly.

Isn't it fixed? You didn't complain so I thought that it's fixed. [that's why I closed the issue...]

ClaireCJS commented 1 year ago

Their code

That's my code. ;)

oops sorry :)

Can TCC run bat files?

Yup. It's basically a fork of COMMAND.COM that Norton Utilities created in 1988 or so, which has changed names a few times (4DOS, then 4NT in the 32 bit days, then TCC in the 64 bit days). Incredibly functional, and in theory 100% compatible with normal bat files. Just a lot of extras.

I can remove a check for limit then. Btw, a variable length has same limit.

Not in TCC, actually. Just checked because I was very curious.

Isn't it fixed? You didn't complain so I thought that it's fixed. [that's why I closed the issue...]

Oh oops, I forgot to check the new version! I gotta go download that right away!

ClaireCJS commented 1 year ago

Aw shoot, i think my lrc code got put in before i made my edit a couple minutes after posting. It needs the hours turned off, and the separator set to . not ,

So sorry! Current LRC output doesn't work

[is 0.6.0 the right version?]


class WriteLRC(ResultWriter):
    extension: str = 'lrc'

    def write_result(self, result: dict, file: TextIO):
        for i, segment in enumerate(result['segments'], start=1):
            # write lrc lines
            print(
                f'[{format_timestamp(segment['start'], always_include_hours=False, decimal_marker='.')}]'
                f'{segment['text'].strip().replace('-->', '').replace('🎵','')}\n'
                f'[{format_timestamp(segment['end'  ], always_include_hours=False, decimal_marker='.')}]\n',
                file=file,
                flush=True,
            )
Purfview commented 1 year ago

I changed it to True in r134.5 version.

is 0.6.0 the right version?

This version is for code in another repo.