google-gemini / generative-ai-python

The official Python library for the Google Gemini API
https://pypi.org/project/google-generativeai/
Apache License 2.0
1.55k stars 309 forks source link

RECITATION error when transcribe audio to text #336

Open nzomkxia opened 5 months ago

nzomkxia commented 5 months ago

Description of the bug:

result=glm.GenerateContentResponse({'candidates': [{'finish_reason': 4, 'index': 0, 'safety_ratings': [], 'token_count': 0, 'grounding_attributions': []}]}),

Actual vs expected behavior:

transcribe the audio correctly

Any other information you'd like to share?

model: gemini-1.5-pro-latest audio length: 53 min audio format: mp3 audio file size: 13m

MarkDaoust commented 5 months ago

This is a known issue, the eng team is working on improving this.

yc1999 commented 5 months ago

Have you finished this issue?

abeusher commented 1 month ago

I'm experiencing a similar problem here. Google gemini-pro-1.5 fails to transcribe the entire file (just transcribes the first 9 minutes of 73 minutes):

#genai.configure(api_key=os.environ["API_KEY"])
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

# Initialize a Gemini model appropriate for your use case.
model = genai.GenerativeModel('models/gemini-1.5-pro-002')

media = pathlib.Path(__file__).parents[0] / "audio_files"
print(f"{media=}")

print("uploading file")
myfile = genai.upload_file(media / "simon_willison.mp3", mime_type="audio/mpeg")
print(f"{myfile=}")
stop = time.time()
elapsed = stop - start
print(f"Time to upload file: {elapsed:.2f} seconds")
#Time to upload file: 37.15 seconds

start = time.time()
fout = open ("simon_willison_transcript.txt", "w")
model = genai.GenerativeModel("gemini-1.5-flash")
model.generation_config = {
    "temperature": 0.5,
    "top_p": 0.95,
    "top_k": 40,
    "max_output_tokens": 500000,
    "response_mime_type": "text/plain",
    "audio_timestamp": True,
}