danielmiessler / fabric

fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
https://danielmiessler.com/p/fabric-origin-story
MIT License
18.09k stars 1.86k forks source link

Transcript parsing? Pattern [Question]: #601

Open xKaliburOS opened 2 weeks ago

xKaliburOS commented 2 weeks ago

What is your question?

I've been messing around with the youtube and audio transcription features for a project I am actively involved in where Im utilizing it for data gathering purposes. Specifically, for the use of gathering interview data.

However, one thing I will note is that while it is good at providing complete transcripts, it does not provide any grammatical punctuation nor attribution of who is talking.

I have attempted to create a pattern that actually goes through and identifies when a person is speaking, making sure that things are properly punctuated (as well as unrolling some of the filters --basically filling in adequate swear words that have been censored), however, for some reason I cannot get fabric to apply the pattern and provide the FULL transcript. I am utilizing only local models, and have even tried using ones that have significantly higher context windows, and have even attempted to utilize customized modified patterns but alas to no avail.

Does anyone have any ideas how I might remedy this? If I could remedy this, this would actually significantly speed up some of my data gathering purposes. I've created TWO patterns one involving agents and one not involving agents. The agents one seems to work far better from the limited outputs I have seen. Is there any means of expanding the token output of a pattern, or, setting a universal max token limit for responses?

Gerkinfeltser commented 2 weeks ago

Hey, this bugged me as well. My fix is one line & the one and only change in this pull request: https://github.com/danielmiessler/fabric/pull/600

edit: I don't have it identify the speaker though...