Closed ClaireCJS closed 1 year ago
I remember 4DOS. :) Is TCC better than PowerShell?
Those "TCC: Unknown command" are TITLE
commands meant to show a progress bar in a title bar.
For example try TITLE this is new title
command.
Is this something I could possibly be helped with?
Yes, I'll check what is going on there with TCC.
Interesting. I've set window titles to keep myself informed in other situations too, it's a fun idea.
and i kind of despise PowerShell :)
in TCC, it's an internal command, same as CMD, so yea, strange stuff going on for sure :)
Can't reproduce it, works OK for me, no errors:
Except that %
character doesn't show in the progress bar, even escape doesn't help.
I used TCC LE as TCC v30 doesn't work with Windows 7.
Check if title 40% ^| 00:00^<^<11:11 ^| a/s
command works for you.
tangent: Just wait until you get to windows 10. You can install windows terminal, which is suuuuuch a great container for TCC. Suddenly, we can see unicode/emojis in filenames, and there's just much better console options. Incredible ansi support, even support for double-height VT100 ansi text which I'm using for certain messages. It's nice having italics and underlines and blink and strikethrough and 2-line-tall text. Plus it supports multiple tabs and panes within each tab which are great.
Anyway, that command actually doesn't work for me.
And I can tell you some reasons why:
1) % is a special character for environment variables. To represent % in TCC you have to use two %s or enclose the title in quotes. So this actually sets the title to "40" for me instead of "40%", but also it still gives the unknown command 00:00
2) Why? Because "^" is the command separator character for me. That may be a bad choice on my part, but it's one i've used since the 1990s.
The solution seems to be to put quotes around it all, which makes "^" safe but still doesn't make "%" work unless it is doubled to "%%":
Of course, I worked around this by using CMD.EXE /C to run your program with CMD.EXE ... So there is a workaround. I just hadn't had to use something like that in years.
[The stuff I make is often unrunnable for folks who don't run TCC because I'm so deeply embedded in it. It's kind of a bummer.]
Yeah, it's not wise to mess with the default escape character. Anyway, check r134+ update, but there is no universal solution for "%" character. [Btw. double %s didn't make it work in TCC LE]
Escaping isn't needed if quotes are around an argument.
It's why this situation doesn't come up for me.
That being said, I'd prefer every command line to use \ for escaping. That's how it is in bash and pretty much every unix-originating CLI and as most programming languages.
One more thing, I've noticed that int8 is used in your screenshots. Did you set it? I think by default float16 should be used, and it should be faster than int8.
One more thing, I've noticed that int8 is used in your screenshots. Did you set it? I think by default float16 should be used, and it should be faster than int8.
that was just the beginning of the list of available operations. I actually have CUDA working (and it was so much easier to get working right this time around than last time--not sure why, I think I like Torch 2.x better than Torch 1.x)
From what i've seen (and from things running in 1-2 minutes in GPU mode when previously taking 15 minutes in CPU mode) everything seems to be using fp16 now. :) Thanks!
Not the list, but there:
[2023-07-06 06:48:53.037] [ctranslate2] [thread 13552] [info] - Selected compute type: int8
Torch is not in use by Faster-Whisper, it's used only by OpenAI's Whisper.
Hmm Had to double-check, but i think it's good on the cuda
But definitely not getting through the song's lyrics
That was with the default/medium model
with large-v2 it's basically the same:
It's not every song that's this bad, but whisper-faster.exe is having such incompleteness compared to whisper.exe with the same model -- at least on my end, that's how it seems.
But whisper-faster has that granularity in the timestamps that whisper.exe lacks. Whisper.exe is not good for making subtitles/karaokes(LRC files) because of how bad the timestamping is. Whisper-faster.exe is excellent for those purposes... but just keeps giving me incomplete transcriptions.
In the end, neither one is giving me the file I want. I know it's possible and I'm sure we'll get there soon. I'm excited to help test this stuff out.
and not sure if it helps, but here's another example of whisper-faster vs whisper. yea, it was 21 seconds vs 8 minutes, but whisper.exe didn't miss the last verse [pardon the profanity in the lyrics lol]
Thanks for screens, I see that I made an error. I didn't meant to select int8 on cuda, this should be fixed in "r134++".
Btw, you are still using r134 instead of r134+ where your issue is fixed.
You get different results because Standalone Faster-Whisper is using different defaults than OpenAI's Whisper.
About settings:
You don't need to use --verbose True
.
Set language for faster transcription: --language=en
.
I think you should disable VAD for music - --vad_filter=False
.
If you want karaoke subtitles then set --highlight_words=True
.
Check if --beam_size=5
makes a difference.
Thanks for screens, I see that I made an error. I didn't meant to select int8 on cuda, this should be fixed in "r134++".
Btw, you are still using r134 instead of r134+ where your issue is fixed.
Thanks! I have no idea how to upgrade actually because i'm unclear what the name is to upgrade it with Pip (i guess it would not make sense for it to be on pip actually) , and when I go to the repository, i only see a readme which doesn't say how to get the files, and i don't see any browseable files like I do with other github repos?
so perhaps... the readme itself might need to be updated with a download link that is more obvious or something?
[I just woke up so my brain isn't fully on yet and i can't remember how I installed it in the first place. Do I have dementia fears? I sure do. I'm getting up there in age and hope I don't get hit with it haha 😅 ]
You get different results because Standalone Faster-Whisper is using different defaults than OpenAI's Whisper.
There doesn't seem to be a way to know the defaults without looking at the code, which is why I wanted verbose on, to see stuff easier.
About settings: You don't need to use
--verbose True
.
But i would like to. To the extent that I believe i submitted a feature request for it (but that might have been for openlrc, i don't remember).
Set language for faster transcription:
--language=en
.
I've been meaning to. :)
I think you should disable VAD for music -
--vad_filter=False
.
Hmmm, why do you think that? I'm curious. I would have thought that would mean there is no silence detection, so the resulting file would have no silence in it. I.E. if some one sings a line before a solo, that line will be stuck onscreen until someone else says something, because there will be no word-specific-end-timestamp.
Am I wrong? That was my reason for trying this fork out, to be honest. To make it so words aren't stuck on the screen 100% of a song.
If you want karaoke subtitles then set --highlight_words=True
.
I think that adds HTML hilighting to the output, which i will be converting to LRC format (an LRCwriter function is really easy to create if the timestamps are good, though! I modified openai's version to create LRC, it's just that the timestamps bum up against each other making it ugly)
Check if --beam_size=5
makes a difference.
Hmm, okay. The -h option describes it as "number of beams in beam search, only applicable when temperature is zero", which unfortunately isn't informative to me because i'm ignorant about some of the inner-workings. I'd love it if the help explained why one would modify it and what the results were and what some reasonable values are.
I'll try it out! Thanks!
Thanks! I have no idea how to upgrade actually because i'm unclear what the name is to upgrade it with Pip...
By "update" I meant - download the new version. All downloads are in Releases: https://github.com/Purfview/whisper-standalone-win/releases
Thanks! I have no idea how to upgrade actually because i'm unclear what the name is to upgrade it with Pip...
By "update" I meant - download the new version. All downloads are in Releases: https://github.com/Purfview/whisper-standalone-win/releases
Thanks! I blame morning brain on this one, because I was able to find it the first time.😅
By the way, vad_filter false alone seems to improve the results vastly -- and that's without me having upgraded yet.
VAAAAAAAAAASTLY.
And my fears earlier about this making the timestamps bad were unfounded.
I will be stripping out the music characters in my situation, an option to not include the lines that are a single music note might be helpfuf, but easy enough for me to implement on my end! :)
Amazing!!! I think this just reached the point where I can proceed with my personal goals of building something around this! awesome!!! thank you so much!!!
MORE SIDE STORY: I use Minilyrics to display lyrics and it makes the LRC file "next" to the mp3 file just like whisper does.
But I also use EvilLyrics and it uses a repo at c:\lyrics\
so I'm going to need to write a piece of code to examine my mp3, look and see if it has an lrc in the same folder with it, if it doesn't, extract the artist and title tag out and check for a file's presence in c:\lyrics, and oly then, create an LRC automatically (I don't want AI LRC files superceding human-created ones!)
The other case I don't know what to do with yet is the case of having lyrics in TXT format, and wanting the AI to make the LRC. What I'd like to do is send the TXT lyrics in as context to whisper, so that whisper makes fewer mistakes. I don't know if anyone is doing this yet or not. pysync seemed to make an attempt but that project appears to be abandoned in March.
There doesn't seem to be a way to know the defaults without looking at the code, which is why I wanted verbose on, to see stuff easier.
whisper-faster.exe --help
But i would like to. To the extent that I believe i submitted a feature request for it...
Definitely not here. There "True" is kinda same as "False" here. "True" here only adds bunch of additional nerd yadda yadda.
I think that adds HTML hilighting to the output...
No, it underscores words in usual srt files.
EDIT: I don't know what LRC files are but if you have python function to output them then I can incorporate it here.
There doesn't seem to be a way to know the defaults without looking at the code, which is why I wanted verbose on, to see stuff easier.
whisper-faster.exe --help
oh, oops, lol, my bad. 😅
But i would like to. To the extent that I believe i submitted a feature request for it...
Definitely not here. There "True" is kinda same as "False" here. "True" here only adds bunch of additional nerd yadda yadda.
I like the yadda yadda 😎
But "True is kinda same as false here" is interesting!
I think that adds HTML hilighting to the output...
No, it underscores words in usual srt files.
Oh how interesting. I use subtitles with VLC a lot but I find them distracting. I think highlighting would be super distracting, but it's really really cool that feature exists. I could have used it when watching Trainspotting...
EDIT: I don't know what LRC files are but if you have python function to output them then I can incorporate it here.
They are a fairly wide (most used, maybe) standard for displaying lyrics in a timed fashion so they are displayed as sung. Here's an example:
And yes, here is how i modified openai to output lrc. Mine is a very very simple implementation but if the SRT is very well-formed and done well, it should make a comparable LRC.
Requires modifying whisper\utils.py to add this class:
class WriteLRC(ResultWriter):
extension: str = 'lrc'
def write_result(self, result: dict, file: TextIO):
for i, segment in enumerate(result['segments'], start=1):
# write lrc lines
print(
f'[{format_timestamp(segment['start'], always_include_hours=False, decimal_marker='.')}]'
f'{segment['text'].strip().replace('-->', '').replace('🎵','')}\n'
f'[{format_timestamp(segment['end' ], always_include_hours=False, decimal_marker='.')}]\n',
file=file,
flush=True,
)
As well as adding:
'lrc': WriteLRC,
to the writers list found in get_writer near the end of the utils.py file:
writers = {
'txt': WriteTXT,
'vtt': WriteVTT,
'srt': WriteSRT,
'tsv': WriteTSV,
'json': WriteJSON,
'lrc': WriteLRC, # add this line
}
p.s. the actual LRC file looks like this
While SRT files have a start and end timestamp
LRC files do not
but LRC files support blank lines, so the end timestamp with a blank line essentially "stops" the words from displaying
You can see a few of those "blank" lines that stop the words from displaying in the example below
[00:07.31]Fences hold me back from mine, baby
[00:11.86]Hold my hands and hold them tight
[00:15.04]I like my pit
[00:18.67]I want to stay
[00:22.10]That way I can't fall back in again
[00:29.98]Ah!
[00:38.40]
[00:39.58]I wanna scream
[00:43.50]Cut you up like you did me
[00:47.40]Is this all I've ever known?
[00:51.82]Is this all I've ever known?
[00:55.80]You took the sunshine from the days
[00:59.92]Now I live in shadows
[01:03.82]I'm just a dog with no bite
[01:07.84]This is all I've ever known
[01:10.36]Can't relax
[01:14.62]Head stuck in the ground
[01:19.26]Heading where we'll never be found
[01:25.58]I hope you feel
[01:29.60]I hope you feel the guilt I do
[01:33.64]You've got no shame
[01:37.60]You did the damage and I feel the pain
[01:43.10]
[01:50.81]I wanna scream
[01:55.01]Cut you up like you did me
[01:58.62]Is this all I've ever known?
[02:03.26]Is this all I've ever known?
[02:07.06]You took the sunshine from the days
[02:11.04]Now I live in shadows
[02:15.04]I'm just a dog with no bite
[02:18.84]This is all I've ever known
[02:21.58]
[03:10.49]Oh!
[03:12.50]I'm gonna forgive you so I can breathe
[03:16.84]Is this all I've ever known?
[03:19.66]Is this all?
[03:22.52]Is this all?
[03:24.24]You took the sunshine from the days
[03:28.36]Now I live in shadows
[03:31.84]I'm just a dog with no bite
[03:36.30]This is all I've ever known
[03:49.66]
So yes, if you could output these i would be SOOOO HAPPY.
I would be running this on about 30,000 songs.
And also if there were a way to keep the lines short.
BBC specifies .srt files shouldn't be wider than 42 characters for example. And for karaoke, shorter lines make more sense.
And yes, here is how i modified openai to output lrc.
Added it to "r134+++". This "+" versioning doesn't look sane anymore. 😅
Oh thank you!!!
And hey, it's still saner than how C# is two C++'s put together 🤦🏼♀️
Oh cool, I was asking for wildcard+filelist support in another post because I'm kind of in the same situation as theirs. Their code checks if the total length of the filelist is greater than 8000, which I guess is CMD's command line length limit, but TCC has no command line length limit other than your RAM. (I mean, we have RAM. Why not use it to load a long line of text!) I've definitely used a lot of stuff where I had command lines running >32K long each. So yea, I'm gonna let them know their solution would work a bit better under TCC
.....except that this invalid system calls issue with the title prevents it from running correctly.
So I don't know, perhaps can we have an option to disable the window titling? Even though I agree it's a really cool feature. It would allow me to run it without invoking CMD.EXE, and it would allow them to use TCC to get around that silly 8K command-line length limit.
Their code
That's my code. ;) Can TCC run bat files? I can remove a check for limit then. Btw, a variable length has same limit. Probably same limits in PowerShell too.
.....except that this invalid system calls issue with the title prevents it from running correctly.
Isn't it fixed? You didn't complain so I thought that it's fixed. [that's why I closed the issue...]
Their code
That's my code. ;)
oops sorry :)
Can TCC run bat files?
Yup. It's basically a fork of COMMAND.COM that Norton Utilities created in 1988 or so, which has changed names a few times (4DOS, then 4NT in the 32 bit days, then TCC in the 64 bit days). Incredibly functional, and in theory 100% compatible with normal bat files. Just a lot of extras.
I can remove a check for limit then. Btw, a variable length has same limit.
Not in TCC, actually. Just checked because I was very curious.
Isn't it fixed? You didn't complain so I thought that it's fixed. [that's why I closed the issue...]
Oh oops, I forgot to check the new version! I gotta go download that right away!
Aw shoot, i think my lrc code got put in before i made my edit a couple minutes after posting. It needs the hours turned off, and the separator set to . not ,
So sorry! Current LRC output doesn't work
[is 0.6.0 the right version?]
class WriteLRC(ResultWriter):
extension: str = 'lrc'
def write_result(self, result: dict, file: TextIO):
for i, segment in enumerate(result['segments'], start=1):
# write lrc lines
print(
f'[{format_timestamp(segment['start'], always_include_hours=False, decimal_marker='.')}]'
f'{segment['text'].strip().replace('-->', '').replace('🎵','')}\n'
f'[{format_timestamp(segment['end' ], always_include_hours=False, decimal_marker='.')}]\n',
file=file,
flush=True,
)
I changed it to True in r134.5 version.
is 0.6.0 the right version?
This version is for code in another repo.
I use a lot of different command-lines, and yet, I don't think I've seen this happen before.
Whisper-faster.exe ends up sending commands to the command line under TCC command line, but not under CMD.EXE.
But I abandoned CMD.EXE back when it was command.com in 1988. TCC has been in constant development. So it's not some janky command line, even though most people haven't heard of it. It's really solid. So I'm wondering how this is happening.
There's some very niche incompatibility here because this is not something I've seen in decades of use.
Any idea if we can address it?
Visually, here's what it looks like under CMD.EXE -- it works just fine:
Yet under TCC.EXE, I get this:
It's sending the timestamps straight to the command line?!?!
Is this something I could possibly be helped with?