Open gbaptista opened 1 week ago
How to easily simulate it:
Give the first page of the first chapter of Harry Potter.
{
"candidates":[
{
"finishReason":"RECITATION",
"safetyRatings":[
{
"category":"HARM_CATEGORY_HATE_SPEECH",
"probability":"NEGLIGIBLE",
"probabilityScore":0.31806138,
"severity":"HARM_SEVERITY_NEGLIGIBLE",
"severityScore":0.13039611
},
{
"category":"HARM_CATEGORY_DANGEROUS_CONTENT",
"probability":"NEGLIGIBLE",
"probabilityScore":0.13764834,
"severity":"HARM_SEVERITY_NEGLIGIBLE",
"severityScore":0.0248928
},
{
"category":"HARM_CATEGORY_HARASSMENT",
"probability":"NEGLIGIBLE",
"probabilityScore":0.44049937,
"severity":"HARM_SEVERITY_NEGLIGIBLE",
"severityScore":0.17050801
},
{
"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability":"NEGLIGIBLE",
"probabilityScore":0.24653332,
"severity":"HARM_SEVERITY_LOW",
"severityScore":0.20914645
}
],
"citationMetadata":{
"citations":[
{
"startIndex":268,
"endIndex":417,
"uri":"https://www.lisarivero.com/2011/06/24/plain-and-fancy-words/"
},
{
"startIndex":302,
"endIndex":581,
"uri":"https://thefriendlyeditor.com/2012/03/09/rowling-hook-page-one/"
}
]
}
}
],
"usageMetadata":{
"promptTokenCount":12,
"candidatesTokenCount":97,
"totalTokenCount":109
}
}
Of course, these are probably expected results, with Google trying to avoid generating copyrighted content. The issue is that there are too many false positives, significantly halting generations for many prompts.
I have the same issue, I try to use Gemini for summarization. Naturally, summarization of copyrighted content would be flagged as "copyrighted content"; however, we have the explicit permission to use it.
Some Google models stop generating content due to
finishReason
=RECITATION
.According to the docs: