FujiwaraChoki / MoneyPrinter

Automate Creation of YouTube Shorts using MoviePy.
MIT License
10.07k stars 1.35k forks source link

[BUG] get_search_terms() is returning the terms in a json script context #216

Closed giubaru closed 7 months ago

giubaru commented 7 months ago

Describe the bug

Sometimes the get_search_terms function is returning something like this:

  ```json
  [
    "Supreme Court ruling",
    "Presidential immunity",
    "January 6 insurrection",
    "Judicial scrutiny",
    "High court decision",
    "Oval Office"
  ]```

To Reproduce Just try using more complex subjects.

Expected behavior Should return only in this format ["search term 1", "search term 2", "search term 3"]

Desktop (please complete the following information):

Additional context Changing the prompt is enough to fix it.

SomethingGeneric commented 7 months ago

Am I missing something or is the desired format and actual format identical?

giubaru commented 7 months ago

No, bacause is returning literally the json with the single quotes

SomethingGeneric commented 7 months ago

Ah so the ` marks are in the response. Got it. Probably an easy thing to check for.

radry commented 7 months ago

gpt.py already has a failsafe that takes care of incorrectly formatted replies from chatgpt.


 print(colored("[*] GPT returned an unformatted response. Attempting to clean...", "yellow"))

        # Attempt to extract list-like string and convert to list
        match = re.search(r'\["(?:[^"\\]|\\.)*"(?:,\s*"[^"\\]*")*\]', response)
        if match:
            try:
                search_terms = json.loads(match.group())
            except json.JSONDecodeError:
                print(colored("[-] Could not parse response.", "red"))
                return []

Unless this "bug" prevents the script from running sucessfully it's not an issue.

giubaru commented 7 months ago

Yes, but in this case the re expression is not working:

Take a look here:

import json, re
def get_search_terms() -> list[str]:
    response = '''\
```json
  [
    "Supreme Court ruling",
    "Presidential immunity",
    "January 6 insurrection",
    "Judicial scrutiny",
    "High court decision",
    "Oval Office"
  ]```'''
    # Parse response into a list of search terms
    search_terms = []

    try:
        search_terms = json.loads(response)
        if not isinstance(search_terms, list) or not all(isinstance(term, str) for term in search_terms):
            raise ValueError("Response is not a list of strings.")

    except (json.JSONDecodeError, ValueError):
        print("[*] GPT returned an unformatted response. Attempting to clean...")

        # Attempt to extract list-like string and convert to list
        match = re.search(r'\["(?:[^"\\]|\\.)*"(?:,\s*"[^"\\]*")*\]', response)
        if match:
            try:
                search_terms = json.loads(match.group())
            except json.JSONDecodeError:
                print("[-] Could not parse response.", "red")
                return []

    # Let user know
    print(f"\nGenerated {len(search_terms)} search terms: {', '.join(search_terms)}")

    # Return search terms
    return search_terms

print(get_search_terms())
radry commented 7 months ago

This regex should fix it in this particular case, but I don't know if it breaks any other formatting issues:
Also it keeps the white spaces, not sure if this causes trouble further down.

match = re.search(r'\[\s*"(?:[^"\\]|\\.)*"(?:,\s*"[^"\\]*")*\s*\]', response)
SomethingGeneric commented 7 months ago

Couldn't one just hardcode a .replace("```","") ?

Or would that also cause more problems?

radry commented 7 months ago

The problem are the additional whitespaces. The regex I posted takes care of that. Please test it.

SomethingGeneric commented 7 months ago

Dude what is with closing stuff and not commenting on it

FujiwaraChoki commented 7 months ago

Dude what is with closing stuff and not commenting on it

Most changes do not make sense, and here, I thought the issue was resolved, as radry mentioned his Regex works.

radry commented 7 months ago

The issue is only resolved when it was tested and added to the code. I didn't open a pull request and you didn't add it either. I only tested the regex with an online regex tool, not the code itself.

FujiwaraChoki commented 7 months ago

The issue is only resolved when it was tested and added to the code. I didn't open a pull request and you didn't add it either. I only tested the regex with an online regex tool, not the code itself.

Fair enough, I forgot to add to the README that I will not be responding to any issues anymore (already added regarding PRs).

radry commented 7 months ago

I will not be responding to any issues anymore (already added regarding PRs).

So this project is abandoned?

SomethingGeneric commented 7 months ago

I will not be responding to any issues anymore (already added regarding PRs).

So this project is abandoned?

probably. i already have a fork that's ready ;)

FujiwaraChoki commented 7 months ago

I will not be responding to any issues anymore (already added regarding PRs).

So this project is abandoned?

Yes, at least for now.