Mossy1022 / Smart-Memos

Super-human brainstorming and note-taking by smart transcribing your voice! This involves a complete transcript, a summary, an expansion on the concepts presented, and a fully customizable analysis of it for any use case you can think of!
MIT License
50 stars 3 forks source link

option to use whisper.cpp #3

Open rawwerks opened 4 months ago

rawwerks commented 4 months ago

https://github.com/ggerganov/whisper.cpp/ unlocks local transcription for effectively any computer.

please consider an option to use this instead of openai whisper api.

Mossy1022 commented 4 months ago

That's definitely in the works!! I understand a local model is very important for many users and it's the next item I'm workin on after I fix this iOS compatibility issue.

rawwerks commented 4 months ago

here's another trick that helps to speed things up when running whisper locally. i preprocess the audio w/ ffmpeg. you might not want all of these options, but here's what i do to my voice memos and it works pretty well. the key thing for speed is the silence removal, but i find the compression and filtering helps for accuracy.

# Execute ffmpeg command to remove silence, add compression, increase volume, and convert to WAV format
ffmpeg -i "$file" -af "highpass=f=80, lowpass=f=6000, afftdn, silenceremove=stop_periods=-1:stop_duration=0.5:stop_threshold=-60dB, acompressor=threshold=0.089:ratio=9:attack=10:release=250, volume=2.0" -ar 16000 -ac 1 -acodec pcm_s16le -f wav -threads $NUM_CORES "$new_file"

(where NUM_CORES=$(sysctl -n hw.ncpu) )

(note that you probably wouldn't want to do this for something like video captions where the timestamp matters, but for voice memos the silence isn't really valuable information.)

Mossy1022 commented 4 months ago

Perfect, my release I just pushed in between now and my last message literally just fixed the iOS compatibility issue lol I'll be on vacation for the 4th till the end of this weekend, but this is the item I'm going to be working on now. Thanks for the tips!

Mossy1022 commented 4 months ago

@rawwerks Do you have any thoughts on using transformers.js for transcription? One of the most important aspects I want for this plugin is to enable it for on-the-go mobile use and from what I've seen that may not work with a libraries that requires a server.

rawwerks commented 4 months ago

I've never used transformers.js, but whisper.cpp is compatible with all modern mobile devices, and powers tons of iOS apps (whisperboard being an example that comes to mind => https://github.com/Saik0s/Whisperboard)

you shouldn't need the user to run any scripts, you should be able to call a subprocess for them. look at whisperboard source code.

Mossy1022 commented 3 months ago

Appreciate the help! The main thing is that it has to run on the Obsidian environment on mobile, which has many more restrictions than it just being able to run on mobile and can't bundle it like how its done for that as well in that example. I've got to somehow package everything into the single main.js file for it to work, as I can't include any other files in the releases (outside of styles and manifest) from what I've seen in the Obsidian forums and discord. I'm sure there's a way to do it. Looking for some other examples in Obsidian plugins that may be using it already.