THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Apache License 2.0
8.56k stars 817 forks source link

Prompt conversion using Google Gemini #349

Open Enchante503 opened 1 month ago

Enchante503 commented 1 month ago

Feature request / 功能建议

This is an example of prompt conversion using Google Gemini. It would be nice if you could select it using checkboxes, etc. *Gemini may censor the prompt depending on its content.

def convert_prompt(prompt: str, retry_times: int = 3) -> str:
    #Get the API key and model ID from environment variables
    api_key = os.environ.get("GOOGLE_API_KEY")
    model_id = os.environ.get("GEMINI_MODEL_ID")

    if not api_key or not model_id:
        return prompt

    genai.configure(api_key=api_key)  # Set API key

    text = prompt.strip()
    model = genai.GenerativeModel(model_id)  # Specify model ID from environment variables

    for i in range(retry_times):
        try:
            # Send text generation request
            response = model.generate_content(
                f"""
                I'm going to provide you with some examples of video descriptive captions. 
                You will need to use these examples to write a new caption for the user input I give you. 
                I will also provide you with a user input for the new video descriptive caption you will need to write. 

                **Examples**
                User Input: "a girl is on the beach" 
                Response: "A radiant woman stands on a deserted beach, arms outstretched, wearing a beige trench coat, white blouse, light blue jeans, and chic boots, against a backdrop of soft sky and sea. Moments later, she is seen mid-twirl, arms exuberant, with the lighting suggesting dawn or dusk. Then, she runs along the beach, her attire complemented by an off-white scarf and black ankle boots, the tranquil sea behind her. Finally, she holds a paper airplane, her pose reflecting joy and freedom, with the ocean's gentle waves and the sky's soft pastel hues enhancing the serene ambiance."

                User Input: "A man jogging on a football field"
                Response: "A determined man in athletic attire, including a blue long-sleeve shirt, black shorts, and blue socks, jogs around a snow-covered soccer field, showcasing his solitary exercise in a quiet, overcast setting. His long dreadlocks, focused expression, and the serene winter backdrop highlight his dedication to fitness. As he moves, his attire, consisting of a blue sports sweatshirt, black athletic pants, gloves, and sneakers, grips the snowy ground. He is seen running past a chain-link fence enclosing the playground area, with a basketball hoop and children's slide, suggesting a moment of solitary exercise amidst the empty field."

                User Input: "A woman is dancing, HD footage, close-up"
                Response: "A young woman with her hair in an updo and wearing a teal hoodie stands against a light backdrop, initially looking over her shoulder with a contemplative expression. She then confidently makes a subtle dance move, suggesting rhythm and movement. Next, she appears poised and focused, looking directly at the camera. Her expression shifts to one of introspection as she gazes downward slightly. Finally, she dances with confidence, her left hand over her heart, symbolizing a poignant moment, all while dressed in the same teal hoodie against a plain, light-colored background."

                **User Input:** "{text}"
                **Response:**
                """.strip()
            )
            return response.text.strip()
        except Exception as e:
            print(f"Error during prediction: {e}")

    return prompt

Motivation / 动机

-

Your contribution / 您的贡献

-

zRzRzRzRzRzRzR commented 1 month ago

The environment we are in cannot test Gemini (limited by regional requests and lack of key), this task may not be completed by us. However, you can modify it based on convert_demo yourself, the main difference is to replace OpenAI's API and SDK with Google's (Gemini is not Microsoft's but Google's)