Open UltraHDR opened 1 year ago
Using yt-dlp to download the subtitles and the gpt3.5 api, this is pretty trivial - In fact I already have a project that is capable of doing this, which I host as a private discord bot for my friends.
The problem is the cost - Using the gpt3.5-16k api, a 30 minute video costs about 2 cent. This is effectively nothing for on-demand private usage, but when scalining it up to a public service or scaling it up to run on all videos just in your homepage suggestions, It'd get very expensive very quickly, even when only running it on popular channels.
Combine that with the fact that this is a free community project, I don't see this happening without a free way to invoke a LLM or many users donating (which I doubt will happen).
Not to mention a relatively large delay of >~15s and possibly mediocre quality of the generated titles
While an interesting concept, these llms at the moment do not seem to like being concise, which makes the titles pretty bad
I think the conciseness problem can be easily solved by fine-tuning the right prompt format.
Here's a prompt that I used:
Output: "Chess Grandmaster Tournament Recap: Wins, Losses, and Heartbreak"
The results were quite good for this specific video, considering I found thinking of a title for this video quite difficult. :sweat_smile:
I also think the API charges can be entirely avoided if you spoof requests to ChatGPT's endpoint from the extension if the user is logged in to OpenAI. (Although, I'm not sure what context length the free version has) Other LLMs, such as Bard, Claude, and Huggingface chat could be considered too.
This could be helpful when suggesting a title to submit I suppose.
I also think the API charges can be entirely avoided if you spoof requests to ChatGPT's endpoint from the extension if the user is logged in to OpenAI.
The ChatGPT UI is limited to a much smaller per-message size than the API - just try pasting your prompt into there. I expect that this will work for none but the shortest of videos. It also requires more extension permissions, adds rather significant complexity (I'd expect), and enters the fight with OpenAI, who will most likely not like extensions doing this.
I was just about to suggest this
I also think the API charges can be entirely avoided if you spoof requests to ChatGPT's endpoint from the extension if the user is logged in to OpenAI.
The ChatGPT UI is limited to a much smaller per-message size than the API - just try pasting your prompt into there. I expect that this will work for none but the shortest of videos. It also requires more extension permissions, adds rather significant complexity (I'd expect), and enters the fight with OpenAI, who will most likely not like extensions doing this.
There are alternative LLMs with more tokens, it is also possible to distribute the task and divide it into small chunks, and of course, target just the popular videos to begin with... there is a great margin of optimization here and using the right APIs and hacks can get things done properly.
There are alternative LLMs with more tokens
I have tested claude, which prominently advertises its 100k token limit.
It fails to answer rather basic questions about an uploaded ~35k token script when the answer to those questions is near the middle of the script, strongly displaying the Lost in the Middle effect that is already seen in gpt3.5.
Of course you can split up the script to create independent summaries and then generate a title based on those, but we can expect quality likely suffering further from this. I would say it remains to be tested, but even if this worked well, it would not really be feasible considering the monetary or resource cost attached to invoking a LLM, especially at this scale and even when only processing the most popular videos.
Unless an organization appears that would like to support this project by providing gpu servers or free access to a LLM api, I doubt looking further into this makes any sense. I strongly doubt that a corporation like OpenAI would to stand by idly if you were to abuse the free version of chatgpt through extension users at this scale.
And even with all this - alternate video titles is only half of what the extension provides.
Combine that with the fact that this is a free community project, I don't see this happening without a free way to invoke a LLM
See: https://github.com/xtekky/gpt4free/issues/40 https://github.com/xtekky/gpt4free/issues/802
This prompt could maybe make this possible
Auto Split Prompt Splitter prompt is a text that will be used when the user prompt in divided into chunkc due to the character limit.
Act like a document/text loader until you load and remember the content of the next text/s or document/s.
There might be multiple files, each file is marked by name in the format ### DOCUMENT NAME.
I will send them to you in chunks. Each chunk starts will be noted as [START CHUNK x/TOTAL], and the end of this chunk will be noted as [END CHUNK x/TOTAL], where x is the number of current chunks, and TOTAL is the number of all chunks I will send you.
I will split the message in chunks, and send them to you one by one. For each message follow the instructions at the end of the message.
Let's begin:
Auto Split Chunk Prompt Chunk prompt is a text that will be added to the end of each chunk. It can be used to summarize the previous chunk or do other things.
Reply with OK: [CHUNK x/TOTAL]
Don't reply with anything else!
Borrowed from Superpower ChatGPT
Can subtitle download and LLM processing be done locally? For the document length issue, are there RAG-like alternatives?
Hi, thanks for this addon.
It'll be cool to automate the following
Thanks