Feat/add google models - Githubissues

LucBERTON commented 2 weeks ago

implemented basic content generation for google genai provider.

Several points of attention :

I could not record the VCR cassette. I think I execute the correct command but it does not seem to generate the cassette with Google provider : pytest --record-mode=once tests/test_google.py
Data Models I only added a record related to gemini-1.5-flash model The number of token count is purely fictionnal
impacts data in google tracer With other providers, we use model_dump() method to get everything from the standard response and then add the computed impacts. For instance here with openai provider :
```
ChatCompletion(**response.model_dump(), impacts=impacts)
```

In our case here with google genai, model_dump() method is not available on GenerateContentResponse object.

Another solution that I found was to use __dict__ attribute. It seems to be working properly but the code is less clean compared to other providers

adrienbanse commented 2 weeks ago

@LucBERTON It might be that the HTTP client used by google.generativeai is not supported by VCR. The list of supported clients is here: https://vcrpy.readthedocs.io/en/latest/installation.html#compatibility.

It's probably another problem because I see that googleapiclient.http uses httplib2, which is normally supported (see https://github.com/googleapis/google-api-python-client/blob/73015a64302d4560fd4783b260526729f47c2d9c/googleapiclient/http.py#L38C8-L38C16)

adrienbanse commented 2 weeks ago

Yes it seems that vcrpy is not able to capture requests made by google.generativeai:

import google.generativeai as genai
import vcr
import httplib2

with vcr.use_cassette('test.yaml'):
    http = httplib2.Http()
    content = http.request("http://www.something.com") # Recorded in `test.yaml`

    model = genai.GenerativeModel('gemini-1.5-flash')
    response = model.generate_content("Write a story about a magic backpack.") # Not recorded in `test.yaml`

The test.yaml file:

interactions:
- request:
    body: null
    headers:
      accept-encoding:
      - gzip, deflate
      user-agent:
      - Python-httplib2/0.22.0 (gzip)
    method: GET
    uri: http://www.something.com/
  response:
    body:
      string: '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

        <html><head>

        <title>301 Moved Permanently</title>

        </head><body>

        <h1>Moved Permanently</h1>

        <p>The document has moved <a href="https://www.something.com/">here</a>.</p>

        </body></html>

        '
    headers:
      Age:
      - '994'
      CF-Cache-Status:
      - HIT
      CF-RAY:
      - 895c4a224b2d0e80-AMS
      Cache-Control:
      - max-age=14400
      Connection:
      - keep-alive
      Content-Type:
      - text/html; charset=iso-8859-1
      Date:
      - Tue, 18 Jun 2024 15:25:01 GMT
      Location:
      - https://www.something.com/
      NEL:
      - '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}'
      Report-To:
      - '{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=gtSza1HvQTn%2FGYwoyP6CTfu7sdMIUEHsO7ga71jzl%2B%2FxsM%2F%2FZs6QiDa2qRViIt1lWVn0kv4RA1E4PBZIFUn1Y2seQJY3mPUY%2B4HfRWPy4IXjk9CExcLLVoswY2C4XxJphXiE1g%3D%3D"}],"group":"cf-nel","max_age":604800}'
      Server:
      - cloudflare
      Transfer-Encoding:
      - chunked
      Vary:
      - Accept-Encoding
      alt-svc:
      - h3=":443"; ma=86400
    status:
      code: 301
      message: Moved Permanently
- request:
    body: null
    headers:
      accept-encoding:
      - gzip, deflate
      user-agent:
      - Python-httplib2/0.22.0 (gzip)
    method: GET 
    uri: https://www.something.com/
  response:
    body:
      string: !!binary |
        H4sIAAAAAAAAA7PJKMnNsbPJSE1MsbMpySzJSbULzs9NLcnIzEvXs9GHiNjog+W5bJLyUypR5MEC
        XDb6YFO4AApUZMBNAAAA
    headers:
      Age:
      - '4584'
      CF-Cache-Status:
      - HIT
      CF-RAY:
      - 895c4a22a8326716-AMS
      Cache-Control:
      - max-age=14400
      Connection:
      - keep-alive
      Content-Encoding:
      - gzip
      Content-Type:
      - text/html; charset=UTF-8
      Date:
      - Tue, 18 Jun 2024 15:25:01 GMT
      Last-Modified:
      - Mon, 07 Mar 2022 03:36:52 GMT
      NEL:
      - '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}'
      Report-To:
      - '{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=BjMiba83pwa5KgRcdFhG8914GfGmEJ6rPWNODrzbEyvnytqtwNDYSgll9d3WyBJjphZ9kUb8QZmOgfNc6lyfCdbbFdp%2FX96q9YNNQEUUGm%2B7acKxFCrCrtYrE7x%2BLiFQQFP58w%3D%3D"}],"group":"cf-nel","max_age":604800}'
      Server:
      - cloudflare
      Transfer-Encoding:
      - chunked
      Vary:
      - Accept-Encoding
      alt-svc:
      - h3=":443"; ma=86400
    status:
      code: 200
      message: OK
version: 1

samuelrince commented 2 weeks ago

Works if you change the test as follows:

@pytest.mark.vcr
def test_google_chat(tracer_init):
+   genai.configure(transport='rest')
    model = genai.GenerativeModel("gemini-1.5-flash")
    response = model.generate_content("Write a story about a magic backpack.")
    assert len(response.text) > 0
    assert response.impacts.energy.value > 0

And have the API key set in GOOGLE_API_KEY env var.

Also need to add in conftest.py

@pytest.fixture(scope="session")
def vcr_config():
    return {"filter_headers": [
        "authorization",
        "api-key",
        "x-api-key",
+       "x-goog-api-key"
    ]}

I am pushing the updates in a minute.

adrienbanse commented 2 weeks ago

Many thanks @samuelrince, let's implement async and stream before merging it

adrienbanse commented 2 weeks ago

The stream implementation I just pushed is not optimal. Consider the following piece of code

import google.generativeai as genai

model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content('Tell me a very short joke', stream=True)

print(response) # Print 1

for chunk in response:
    print(chunk) # Print 2.i

print(response) # Print 3

All prints will show different GenerateContentResponse. The first one basically only serves for storing the iterator. All the 2.i are truncated GenerateContentResponse only corresponding to the chunk, without any iterator, and the last one is a GenerateContentResponse that contains both the iterator and the joint chunks.

The implementation as it is now is such that the following piece of code:

import google.generativeai as genai
from ecologits import EcoLogits
EcoLogits.init()

model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content('Tell me a very short joke', stream=True)

print(response) # Print 1 -> Not a GenerateContentResponse, so no impact

for chunk in response:
    print(chunk.impacts) # Print 2.i

print(response) # Print 3 -> Not a GenerateContentResponse, so no impact

will only work for the parts 2.i. Although to me the first GenerateContentResponse is useless in most applications, it would be nice to have the last one summarizing the impacts of the chunks.

adrienbanse commented 1 week ago

The combination of transport=rest and async is broken ingoogle-generativeai, see https://github.com/google-gemini/generative-ai-python/issues/203 This prevents us from testing async with cassettes for now, I removed the async tests for now, let's wait for this to be fixed

samuelrince commented 1 week ago

Possible to also include the documentation in this PR? 😇

You can take example on the other providers. To run the website locally, you can do:

mkdocs serve

genai-impact / ecologits

Feat/add google models #50