Sweetdevil144 / Youtube-Shorts-Creator

A Website that helps extract shorts segments from a youtube video link
https://youtube-shorts-creator.onrender.com
11 stars 6 forks source link

Refracting and Improving `extractShorts` and `divideCaptionsIntoChunks` in `fetchresults.js` #2

Closed Sweetdevil144 closed 1 year ago

Sweetdevil144 commented 1 year ago

1. Update the Data Structure:

Our fetched transcripts object has changed to a new format:

{
  transcript: [
    {
      snippet: "okay in terms of solutions yeah I'm",
      start_time: '0:02'
    }, 
    ...more items
  ]
}

The transcript is an array of objects, each having a snippet and a start_time. So, any function using the old captions format will need to be updated to use this new structure.

2. Dividing Captions into Chunks (Token Limit Consideration):

The primary goal of the divideCaptionsIntoChunks function is to divide the video transcript into sections or "chunks" so that they can be analyzed by the OpenAI API without exceeding the token limit.

Steps for the updated divideCaptionsIntoChunks:

3. Analyzing the Chunks:

The purpose of analyzeCaptions is to assess how suitable each chunk is for a short video. The more engaging or relevant a chunk, the higher its score or "rating" should be.

For the updated analyzeCaptions:

4. Sending Data to the Frontend:

You'll pass the organized and analyzed data from fetchresults.js to app.js, which in turn sends it to the frontend script.js.

Steps:


Detailed Steps and Code Changes:

  1. Divide the Captions into Chunks:

    • Modify the divideCaptionsIntoChunks function to loop over transcripts.transcript:

      function divideCaptionsIntoChunks(transcripts) {
      let chunks = [];
      let currentChunk = [];
      let currentTokens = 0;
      for (let item of transcripts) {
       const tokens = item.snippet.split(" ").length; // naive token count based on words
       if (currentTokens + tokens > 15000) { // keeping a margin for safety
         chunks.push(currentChunk);
         currentChunk = [item];
         currentTokens = tokens;
       } else {
         currentChunk.push(item);
         currentTokens += tokens;
       }
      }
      if (currentChunk.length) {
       chunks.push(currentChunk);
      }
      return chunks;
      }
  2. Analyzing the Chunks:

    • Since we're aiming for a specific output format, our conversation with the API should ask for that format explicitly. Remember that GPT-3.5 might not always return the exact format you want, so post-processing might be needed.
  3. Update app.js:

    • Once you get the shorts from fetchResults.extractShorts, send them as a JSON response:
      return res.json({ success: true, shorts: topShorts });
  4. Update script.js:

    • Use the returned shorts data to embed the videos:
      if (data.success) {
      embedVideos(data.shorts);
      }