Closed LucBERTON closed 1 week ago
@LucBERTON It might be that the HTTP client used by google.generativeai
is not supported by VCR
. The list of supported clients is here: https://vcrpy.readthedocs.io/en/latest/installation.html#compatibility.
It's probably another problem because I see that googleapiclient.http
uses httplib2
, which is normally supported
(see https://github.com/googleapis/google-api-python-client/blob/73015a64302d4560fd4783b260526729f47c2d9c/googleapiclient/http.py#L38C8-L38C16)
Yes it seems that vcrpy
is not able to capture requests made by google.generativeai
:
import google.generativeai as genai
import vcr
import httplib2
with vcr.use_cassette('test.yaml'):
http = httplib2.Http()
content = http.request("http://www.something.com") # Recorded in `test.yaml`
model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content("Write a story about a magic backpack.") # Not recorded in `test.yaml`
The test.yaml
file:
interactions:
- request:
body: null
headers:
accept-encoding:
- gzip, deflate
user-agent:
- Python-httplib2/0.22.0 (gzip)
method: GET
uri: http://www.something.com/
response:
body:
string: '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://www.something.com/">here</a>.</p>
</body></html>
'
headers:
Age:
- '994'
CF-Cache-Status:
- HIT
CF-RAY:
- 895c4a224b2d0e80-AMS
Cache-Control:
- max-age=14400
Connection:
- keep-alive
Content-Type:
- text/html; charset=iso-8859-1
Date:
- Tue, 18 Jun 2024 15:25:01 GMT
Location:
- https://www.something.com/
NEL:
- '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}'
Report-To:
- '{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=gtSza1HvQTn%2FGYwoyP6CTfu7sdMIUEHsO7ga71jzl%2B%2FxsM%2F%2FZs6QiDa2qRViIt1lWVn0kv4RA1E4PBZIFUn1Y2seQJY3mPUY%2B4HfRWPy4IXjk9CExcLLVoswY2C4XxJphXiE1g%3D%3D"}],"group":"cf-nel","max_age":604800}'
Server:
- cloudflare
Transfer-Encoding:
- chunked
Vary:
- Accept-Encoding
alt-svc:
- h3=":443"; ma=86400
status:
code: 301
message: Moved Permanently
- request:
body: null
headers:
accept-encoding:
- gzip, deflate
user-agent:
- Python-httplib2/0.22.0 (gzip)
method: GET
uri: https://www.something.com/
response:
body:
string: !!binary |
H4sIAAAAAAAAA7PJKMnNsbPJSE1MsbMpySzJSbULzs9NLcnIzEvXs9GHiNjog+W5bJLyUypR5MEC
XDb6YFO4AApUZMBNAAAA
headers:
Age:
- '4584'
CF-Cache-Status:
- HIT
CF-RAY:
- 895c4a22a8326716-AMS
Cache-Control:
- max-age=14400
Connection:
- keep-alive
Content-Encoding:
- gzip
Content-Type:
- text/html; charset=UTF-8
Date:
- Tue, 18 Jun 2024 15:25:01 GMT
Last-Modified:
- Mon, 07 Mar 2022 03:36:52 GMT
NEL:
- '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}'
Report-To:
- '{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=BjMiba83pwa5KgRcdFhG8914GfGmEJ6rPWNODrzbEyvnytqtwNDYSgll9d3WyBJjphZ9kUb8QZmOgfNc6lyfCdbbFdp%2FX96q9YNNQEUUGm%2B7acKxFCrCrtYrE7x%2BLiFQQFP58w%3D%3D"}],"group":"cf-nel","max_age":604800}'
Server:
- cloudflare
Transfer-Encoding:
- chunked
Vary:
- Accept-Encoding
alt-svc:
- h3=":443"; ma=86400
status:
code: 200
message: OK
version: 1
Works if you change the test as follows:
@pytest.mark.vcr
def test_google_chat(tracer_init):
+ genai.configure(transport='rest')
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("Write a story about a magic backpack.")
assert len(response.text) > 0
assert response.impacts.energy.value > 0
And have the API key set in GOOGLE_API_KEY
env var.
Also need to add in conftest.py
@pytest.fixture(scope="session")
def vcr_config():
return {"filter_headers": [
"authorization",
"api-key",
"x-api-key",
+ "x-goog-api-key"
]}
I am pushing the updates in a minute.
Many thanks @samuelrince, let's implement async and stream before merging it
The stream implementation I just pushed is not optimal. Consider the following piece of code
import google.generativeai as genai
model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content('Tell me a very short joke', stream=True)
print(response) # Print 1
for chunk in response:
print(chunk) # Print 2.i
print(response) # Print 3
All prints will show different GenerateContentResponse
. The first one basically only serves for storing the iterator. All the 2.i are truncated GenerateContentResponse
only corresponding to the chunk, without any iterator, and the last one is a GenerateContentResponse
that contains both the iterator and the joint chunks.
The implementation as it is now is such that the following piece of code:
import google.generativeai as genai
from ecologits import EcoLogits
EcoLogits.init()
model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content('Tell me a very short joke', stream=True)
print(response) # Print 1 -> Not a GenerateContentResponse, so no impact
for chunk in response:
print(chunk.impacts) # Print 2.i
print(response) # Print 3 -> Not a GenerateContentResponse, so no impact
will only work for the parts 2.i. Although to me the first GenerateContentResponse
is useless in most applications, it would be nice to have the last one summarizing the impacts of the chunks.
The combination of transport=rest
and async is broken ingoogle-generativeai
, see https://github.com/google-gemini/generative-ai-python/issues/203
This prevents us from testing async with cassettes for now, I removed the async tests for now, let's wait for this to be fixed
Possible to also include the documentation in this PR? 😇
You can take example on the other providers. To run the website locally, you can do:
mkdocs serve
implemented basic content generation for google genai provider.
Several points of attention :
I could not record the VCR cassette. I think I execute the correct command but it does not seem to generate the cassette with Google provider :
pytest --record-mode=once tests/test_google.py
Data Models I only added a record related to gemini-1.5-flash model The number of token count is purely fictionnal
impacts data in google tracer With other providers, we use
model_dump()
method to get everything from the standard response and then add the computed impacts. For instance here with openai provider :In our case here with google genai,
model_dump()
method is not available onGenerateContentResponse
object.Another solution that I found was to use
__dict__
attribute. It seems to be working properly but the code is less clean compared to other providers