GoogleCloudPlatform / generative-ai

Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI
https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview
Apache License 2.0
8.26k stars 2.33k forks source link

[Bug]: Grounding with Gemini doesn't work, no metadata is returned #688

Closed sanjanalreddy closed 2 months ago

sanjanalreddy commented 6 months ago

File Name

https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb

What happened?

Grounding with Gemini doesn't work. Gemini model doesn't seem to be using the grounding source to answer the prompt. The same datasource with Palm however returns grounded responses and grounding metadata.

Screenshot 2024-05-09 at 4 27 36 PM

Relevant log output

No response

Code of Conduct

holtskinner commented 6 months ago

Can you provide details about the specific queries and data source you're using for grounding?

sanjanalreddy commented 6 months ago

I'm using the same data source that is mentioned in the notebook that points to cloud.google.com/* and the prompt is When should I use an object table in BigQuery? And how does it store data?

maralm commented 5 months ago

Similar experience here. The grounding_metadata contains the retrieval_queries property only, but nothing that looks like citations or attributions. For example, the method print_grounding_response in the grounding example indicates that there should be attributions available, but I'm consistently not getting any.

I'm a bit perplexed in general because I can set up a prompt and grounding source in the Vertex AI UI, try it out and it works okay, then click "Get Code" and execute that code and get no results. In general, results seem to be much worse using latest SDK and Gemini models, compared to the previous way of doing it and with text-bison.

paulcalcraft commented 3 months ago

I'm having the same issue. I use "Get Code" for my Vertex AI example and run it in Node.js:

// const {VertexAI} = require('@google-cloud/vertexai');
import {VertexAI} from '@google-cloud/vertexai';

// Initialize Vertex with your Cloud project and location
const vertex_ai = new VertexAI({project: 'XXXXXX', location: 'us-central1'});
const model = 'gemini-1.5-pro-001';

// Instantiate the models
const generativeModel = vertex_ai.preview.getGenerativeModel({
  model: model,
  generationConfig: {
    'maxOutputTokens': 8192,
    'temperature': 0,
    // 'topP': 0.95,
  },
  safetySettings: [
    {
        'category': 'HARM_CATEGORY_HATE_SPEECH',
        'threshold': 'BLOCK_MEDIUM_AND_ABOVE'
    },
    {
        'category': 'HARM_CATEGORY_DANGEROUS_CONTENT',
        'threshold': 'BLOCK_MEDIUM_AND_ABOVE'
    },
    {
        'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT',
        'threshold': 'BLOCK_MEDIUM_AND_ABOVE'
    },
    {
        'category': 'HARM_CATEGORY_HARASSMENT',
        'threshold': 'BLOCK_MEDIUM_AND_ABOVE'
    }
  ],
  tools: [
    {
      googleSearchRetrieval: {
        // disableAttribution: false,
      },
    },
  ],
  systemInstruction: {
    parts: [{"text": `use full citations from google search for every answer`}]
  },
});

const chat = generativeModel.startChat({});

async function sendMessage(message) {
  const streamResult = await chat.sendMessageStream(message);
  const response = await streamResult.response
  process.stdout.write('stream result: ' + JSON.stringify(response.candidates[0].content) + '\n');
  // full response including grounding
  console.log(JSON.stringify(response, null, 2))
}

async function generateContent() {
  await sendMessage([
    {text: `summary of ARMADA: Attribute-Based Multimodal Data Augmentation`}
  ]);
}

generateContent();

The json output is:

{
  "usageMetadata": {
    "promptTokenCount": 22,
    "candidatesTokenCount": 210,
    "totalTokenCount": 232
  },
  "candidates": [
    {
      "index": 0,
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "ARMADA (Attribute-Based Multimodal Data Augmentation) is a n
ew method for augmenting image-text pair data for Multimodal Language Models (MLM
s). It addresses the limitations of existing methods that result in semantic inco
nsistencies or unrealistic images. ARMADA extracts entities and their visual attr
ibutes from the text and uses knowledge bases (KBs) and large language models (LL
Ms) to find alternative attribute values. An image-editing model then modifies th
e images based on these new attributes. The benefits of ARMADA include: generatin
g semantically consistent and diverse image-text pairs, creating visually similar
 images from different categories using KB hierarchies, and leveraging common sen
se knowledge from LLMs to adjust auxiliary attributes like backgrounds.  \"ARMADA
: Attribute-Based Multimodal Data Augmentation - Paper Reading. \" *Paperreading.
club*, Accessed August 20, 2024, https://paperreading.club/paper/2023/emnlp-armad
a-attribute-based-multimodal-data-augmentation. \n"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.15429688,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.16113281
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.18945313,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.17285156
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.10986328,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.09423828
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.05834961,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.08496094
        }
      ],
      "groundingMetadata": {
        "webSearchQueries": [
          "ARMADA: Attribute-Based Multimodal Data Augmentation"
        ],
        "groundingAttributions": [],
        "retrievalQueries": [],
        "groundingChunks": [],
        "groundingSupports": [],
        "searchEntryPoint": {
          "renderedContent": "<style>\n.container {\n  align-items: center;\n  bo
rder-radius: 8px;\n  display: flex;\n  font-family: Google Sans, Roboto, sans-ser
if;\n  font-size: 14px;\n  line-height: 20px;\n  padding: 8px 12px;\n}\n.chip {\n
  display: inline-block;\n  border: solid 1px;\n  border-radius: 16px;\n  min-wid
th: 14px;\n  padding: 5px 16px;\n  text-align: center;\n  user-select: none;\n  m
argin: 0 8px;\n  -webkit-tap-highlight-color: transparent;\n}\n.carousel {\n  ove
rflow: auto;\n  scrollbar-width: none;\n  white-space: nowrap;\n  margin-right: -
12px;\n}\n.headline {\n  display: flex;\n  margin-right: 4px;\n}\n.gradient-conta
iner {\n  position: relative;\n}\n.gradient {\n  position: absolute;\n  transform
: translate(3px, -9px);\n  height: 36px;\n  width: 9px;\n}\n@media (prefers-color
-scheme: light) {\n  .container {\n    background-color: #fafafa;\n    box-shadow
: 0 0 0 1px #0000000f;\n  }\n  .headline-label {\n    color: #1f1f1f;\n  }\n  .ch
ip {\n    background-color: #ffffff;\n    border-color: #d2d2d2;\n    color: #5e5
e5e;\n    text-decoration: none;\n  }\n  .chip:hover {\n    background-color: #f2
f2f2;\n  }\n  .chip:focus {\n    background-color: #f2f2f2;\n  }\n  .chip:active 
{\n    background-color: #d8d8d8;\n    border-color: #b6b6b6;\n  }\n  .logo-dark 
{\n    display: none;\n  }\n  .gradient {\n    background: linear-gradient(90deg,
 #fafafa 15%, #fafafa00 100%);\n  }\n}\n@media (prefers-color-scheme: dark) {\n  
.container {\n    background-color: #1f1f1f;\n    box-shadow: 0 0 0 1px #ffffff26
;\n  }\n  .headline-label {\n    color: #fff;\n  }\n  .chip {\n    background-col
or: #2c2c2c;\n    border-color: #3c4043;\n    color: #fff;\n    text-decoration: 
none;\n  }\n  .chip:hover {\n    background-color: #353536;\n  }\n  .chip:focus {
\n    background-color: #353536;\n  }\n  .chip:active {\n    background-color: #4
64849;\n    border-color: #53575b;\n  }\n  .logo-light {\n    display: none;\n  }
\n  .gradient {\n    background: linear-gradient(90deg, #1f1f1f 15%, #1f1f1f00 10
0%);\n  }\n}\n</style>\n<div class=\"container\">\n  <div class=\"headline\">\n  
  <svg class=\"logo-light\" width=\"18\" height=\"18\" viewBox=\"9 9 35 35\" fill
=\"none\" xmlns=\"http://www.w3.org/2000/svg\">\n      <path fill-rule=\"evenodd\
" clip-rule=\"evenodd\" d=\"M42.8622 27.0064C42.8622 25.7839 42.7525 24.6084 42.5
487 23.4799H26.3109V30.1568H35.5897C35.1821 32.3041 33.9596 34.1222 32.1258 35.34
48V39.6864H37.7213C40.9814 36.677 42.8622 32.2571 42.8622 27.0064V27.0064Z\" fill
=\"#4285F4\"/>\n      <path fill-rule=\"evenodd\" clip-rule=\"evenodd\" d=\"M26.3
109 43.8555C30.9659 43.8555 34.8687 42.3195 37.7213 39.6863L32.1258 35.3447C30.58
98 36.3792 28.6306 37.0061 26.3109 37.0061C21.8282 37.0061 18.0195 33.9811 16.655
9 29.906H10.9194V34.3573C13.7563 39.9841 19.5712 43.8555 26.3109 43.8555V43.8555Z
\" fill=\"#34A853\"/>\n      <path fill-rule=\"evenodd\" clip-rule=\"evenodd\" d=
\"M16.6559 29.8904C16.3111 28.8559 16.1074 27.7588 16.1074 26.6146C16.1074 25.470
4 16.3111 24.3733 16.6559 23.3388V18.8875H10.9194C9.74388 21.2072 9.06992 23.8247
 9.06992 26.6146C9.06992 29.4045 9.74388 32.022 10.9194 34.3417L15.3864 30.8621L1
6.6559 29.8904V29.8904Z\" fill=\"#FBBC05\"/>\n      <path fill-rule=\"evenodd\" c
lip-rule=\"evenodd\" d=\"M26.3109 16.2386C28.85 16.2386 31.107 17.1164 32.9095 18
.8091L37.8466 13.8719C34.853 11.082 30.9659 9.3736 26.3109 9.3736C19.5712 9.3736 
13.7563 13.245 10.9194 18.8875L16.6559 23.3388C18.0195 19.2636 21.8282 16.2386 26
.3109 16.2386V16.2386Z\" fill=\"#EA4335\"/>\n    </svg>\n    <svg class=\"logo-da
rk\" width=\"18\" height=\"18\" viewBox=\"0 0 48 48\" xmlns=\"http://www.w3.org/2
000/svg\">\n      <circle cx=\"24\" cy=\"23\" fill=\"#FFF\" r=\"22\"/>\n      <pa
th d=\"M33.76 34.26c2.75-2.56 4.49-6.37 4.49-11.26 0-.89-.08-1.84-.29-3H24.01v5.9
9h8.03c-.4 2.02-1.5 3.56-3.07 4.56v.75l3.91 2.97h.88z\" fill=\"#4285F4\"/>\n     
 <path d=\"M15.58 25.77A8.845 8.845 0 0 0 24 31.86c1.92 0 3.62-.46 4.97-1.31l4.79
 3.71C31.14 36.7 27.65 38 24 38c-5.93 0-11.01-3.4-13.45-8.36l.17-1.01 4.06-2.85h.
8z\" fill=\"#34A853\"/>\n      <path d=\"M15.59 20.21a8.864 8.864 0 0 0 0 5.58l-5
.03 3.86c-.98-2-1.53-4.25-1.53-6.64 0-2.39.55-4.64 1.53-6.64l1-.22 3.81 2.98.22 1
.08z\" fill=\"#FBBC05\"/>\n      <path d=\"M24 14.14c2.11 0 4.02.75 5.52 1.98l4.3
6-4.36C31.22 9.43 27.81 8 24 8c-5.93 0-11.01 3.4-13.45 8.36l5.03 3.85A8.86 8.86 0
 0 1 24 14.14z\" fill=\"#EA4335\"/>\n    </svg>\n    <div class=\"gradient-contai
ner\"><div class=\"gradient\"></div></div>\n  </div>\n  <div class=\"carousel\">\
n    <a class=\"chip\" href=\"https://www.google.com/search?q=ARMADA:+Attribute-B
ased+Multimodal+Data+Augmentation&client=app-vertex-grounding&safesearch=active\"
>ARMADA: Attribute-Based Multimodal Data Augmentation</a>\n  </div>\n</div>\n"   
        }
      }
    }
  ]
}

The grounding metadata has empty arrays for all the useful elements. I've tried it with and without the disableAttribution: false line, which is supposedly deprecated anyway.

You can see it's trying to give citations in a weird format in the text itself, due to my system prompt. Without my system prompt it does [1] [2] etc. citations, but again nothing is in the grounding data for that to refer to.

Fwiw, when I try to do this via Vertex AI on the web, I never see the actual citation info either, even if I select Google Search as the grounding.