Open 0wwafa opened 3 weeks ago
Hey @0wwafa I'll need to check if the API now accepts files that aren't part of the Google Account files, Gemini Vision Pro the last time I checked required to be signed in to Upload files which were not accepted just with the API required on this basic app.
If files can be passed without OAuth 2.0 I can add that feature.
I'll make sure to keep you posted.
https://ai.google.dev/api/files
myfile = genai.upload_file(media / "poem.txt")
file_name = myfile.name
print(file_name) # "files/*"
myfile = genai.get_file(file_name)
print(myfile)
@fjosue4
document = genai.upload_file(path=media / "a11.txt")
model_name = "gemini-1.5-flash-001"
cache = genai.caching.CachedContent.create(
model=model_name,
system_instruction="You are an expert analyzing transcripts.",
contents=[document],
)
print(cache)
model = genai.GenerativeModel.from_cached_content(cache)
response = model.generate_content("Please summarize this transcript")
print(response.text)
@0wwafa I just tested it but no luck TypeError: (0 , import_fs.readFileSync) is not a function
Seems like it's not browser-based.
I'll be testing the upcoming days with this other documentation: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini
Update 2:
This endpoint seems to exist
https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=${apiKey}
But not able yet to get a compatible way to send the files.
// Make sure to include these imports:
// import { GoogleAIFileManager } from "@google/generative-ai/server";
// import { GoogleGenerativeAI } from "@google/generative-ai";
const fileManager = new GoogleAIFileManager(process.env.API_KEY);
const uploadResult = await fileManager.uploadFile(
`${mediaPath}/jetpack.jpg`,
{
mimeType: "image/jpeg",
displayName: "Jetpack drawing",
},
);
// View the response.
console.log(
`Uploaded file ${uploadResult.file.displayName} as: ${uploadResult.file.uri}`,
);
const genAI = new GoogleGenerativeAI(process.env.API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
const result = await model.generateContent([
"Tell me about this image.",
{
fileData: {
fileUri: uploadResult.file.uri,
mimeType: uploadResult.file.mimeType,
},
},
]);
console.log(result.response.text());
Nope @0wwafa we need to use the endpoint as the GoogleAIFileManager is not compatible with the browser.
Here's what ChatGPT still says about this error:
check the rest api. https://ai.google.dev/api/files#files_get-SHELL
I tested it both in python both in nodejs both in shell with CURL and they all work.
this also works:
const url = `https://generativelanguage.googleapis.com/v1beta/models?key=${apiKey}`;
fetch(url, {
method: 'GET',
})
.then(response => response.json())
.then(data => {
console.log(data);
})
.catch(error => {
console.error('Error:', error);
});
and also this:
const url = `https://generativelanguage.googleapis.com/v1beta/files?key=${apiKey}`;
hmm I see the problem.. when doing a POST to upload the file it seems there is a problem:
No 'Access-Control-Allow-Origin' header is present on the requested resource.
but that can be managed from the back-end with a small nodejs or python program...
Yep.. it must be done in the back-end.. in nodejs:
https://ai.google.dev/api/files#files_create_text-JAVASCRIPT
// Make sure to include these imports:
// import { GoogleAIFileManager } from "@google/generative-ai/server";
// import { GoogleGenerativeAI } from "@google/generative-ai";
const fileManager = new GoogleAIFileManager(process.env.API_KEY);
const uploadResult = await fileManager.uploadFile(`${mediaPath}/a11.txt`, {
mimeType: "text/plain",
displayName: "Apollo 11",
});
// View the response.
console.log(
`Uploaded file ${uploadResult.file.displayName} as: ${uploadResult.file.uri}`,
);
const genAI = new GoogleGenerativeAI(process.env.API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
const result = await model.generateContent([
"Transcribe the first few sentences of this document.",
{
fileData: {
fileUri: uploadResult.file.uri,
mimeType: uploadResult.file.mimeType,
},
},
]);
console.log(result.response.text());
Yes, it runs correctly on NodeJS, but this UI is browser-based with a pure frontend that's why it's not working directly and requires a similar endpoint as the one I passed you but files won't be stored because Google doesn't like it that way.
an alternative is the inlining:
"parts":[
{
"inline_data": {
"mime_type":"text/plain",
"data": "'$(base64 $B64FLAGS a11.txt)'"
}
}
],
the are many mime types accepted including: pdf, png, text, mp3, wmv, mp4 etc
The real problem with the web api is that every time you prompt the model you are forced to send everything (all the history etc). I opened a bug report on geminiai about this. Even cached content does not work because they use the "fs" api, but perhaps there could be a workaround.
Yes, that's a problem because all chats at least on this app are stored in LocalStorage to provide context to Gemini, even with this sometimes it reads the message and responds something incorrectly because it got lost reading all historical messages, so storing also files would be a huge memory problem we would be sending all files all the time, with text it's hard to get it full but with files using base64 will crash the app shortly.
memory? gemini flash has 1M token context! and the base64 inlining works. subsequently (chatting) the image can be removed leaving its answers on the image.. this works perfectly also on aistudio. please consider the inlining I posted above..
I just found out that it's even simpler!!!
"parts":[{"text": "BASE64DATA"}]
it automaticalyy analyze them!!
> "contents": [{
> "parts":[{"text": "ewogICJkZXBlbmRlbmNpZXMiOiB7CiAgICAiQGdvb2dsZS1haS9nZW5lcmF0aXZlbGFuZ3VhZ2UiOiAiXjIuNS4wIiwKICAgICJAZ29vZ2xlL2dlbmVyYXRpdmUtYWkiOiAiXjAuMTEuMyIsCiAgICAiY3J5cHRvLWpzIjogIl40LjIuMCIsCiAgICAid3MiOiAiXjguMTcuMCIKICB9Cn0K"},{"text": "Analyze this."}]
> }]
> }' 2> /dev/null
{
"candidates": [
{
"content": {
"parts": [
{
"text": "This is a JSON object representing dependencies for a project. Here's a breakdown:\n\n**Structure**\n\n* **\"dependencies\"**: This is the main key that holds all the dependencies.\n* **\"@[dependency name]\"**: Each key within \"dependencies\" represents a specific dependency with its name and version number.\n\n**Dependencies**\n\n* **\"@google-ai/generative-language\"**: This dependency is for a generative language library from Google AI. Its version is \"v2.5.0\".\n* **\"@google/generative-ai\"**: Another library from Google, likely for generative AI tasks. Its version is \"v0.11.3\".\n* **\"crypto-js\"**: A library for working with cryptographic functions. Its version is \"v4.2.0\".\n* **\"ws\"**: This is a library for working with websockets. Its version is \"v8.17.0\".\n\n**Meaning**\n\nThis JSON snippet likely comes from a project's `package.json` file. It defines the software libraries that the project relies on. When installing this project, a package manager (like npm or yarn) will automatically fetch and install these dependencies and their specified versions, ensuring that the project has all the necessary components to run correctly.\n\n**Key points to remember:**\n\n* Dependency management is crucial in software development to ensure consistency and avoid conflicts.\n* Using specific versions (like \"v2.5.0\") is important for maintaining compatibility and preventing unexpected behavior.\n* `package.json` is a standard file used to define project metadata, including dependencies, for Node.js and JavaScript projects. \n"
Awesome @0wwafa I'll test passing the base64 inside the content this weekend, I'll let you know how it goes!
About memory, I was talking on the user side (browser) keeping the historical there.
Awesome @0wwafa I'll test passing the base64 inside the content this weekend, I'll let you know how it goes!
after a few more tests (I passed an image) It didn't work well. I don't know how it really wiorks on the back-end. anyway there are multiple ways to upload files. another simple one is: upload a file to google drive. enable sharing to "whoever has the link" and then add the ID of the file in the conversation.
that is the method aistudio uses. but in aistudio you can also paste an image in the chatbox... the image is passed to the model in base64. I'll get back to you when I find a solid way to do it from a webapp.
here is how to do it:
async function fileToGenerativePart(file) {
const base64EncodedDataPromise = new Promise((resolve) => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result.split(',')[1]);
reader.readAsDataURL(file);
});
return {
inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
};
}
tested and working:
const data = JSON.stringify({
contents: [{
parts: [{
inlineData: {
mimeType: mimeType,
data: fileContent.toString('base64')
}
},
{
text: 'Analyze this.'
},
],
}, ],
});
$ node anal2.js woman_art1.jpg
The painting depicts a woman sitting at a cafe table, her gaze directed downwards, creating a sense of introspection. She is dressed in a vibrant red dress, accentuated by a white top, suggesting a sense of sophistication and femininity. The dress, with its flowing lines, adds a graceful touch to the composition. Her long, flowing hair cascades down her back, framing her face and drawing attention to her features.
The setting is a Parisian cafe, a quintessential location synonymous with romance and art. The background, although blurred, provides a glimpse into the bustling cafe scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene.
The use of light and shadow adds depth and dimension to the painting. The sun casts a warm glow on the woman, highlighting her features and creating a sense of warmth. The shadows, meanwhile, accentuate the lines of the cafe and the figures of the patrons, creating a sense of depth and realism. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene.
The overall mood of the painting is one of contemplation and tranquility. The woman's pensive expression, combined with the relaxed setting of the cafe, creates a sense of peacefulness. The warm colors and the use of light and shadow further enhance this sense of tranquility, inviting the viewer to step into the painting and experience the moment.
The painting is a beautiful representation of a timeless scene, capturing the essence of Parisian cafe culture. It is a testament to the artist's skill in depicting the human figure and creating a sense of realism and beauty. The painting is sure to captivate viewers with its evocative imagery and its ability to transport them to a different time and place. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene.
the only restriction is that the payload can be 20971520 bytes maximum.
it seems to work with many more file types than the ones publicized :D
$ node anal2.js ../spectrogram.html This HTML code creates a web page that visualizes audio input from the user's microphone in the form of a spectrogram. Here's a breakdown of the code and its functionality:
HTML Structure
<!DOCTYPE html>
, <html>
, <head>
, <body>
).style.css
: An external CSS file is linked to the HTML, likely containing additional styles that aren't included in the inline <style>
tag.<style>
tag provides basic styling:
body
: Sets the background color to dark gray, fills the entire viewport, removes margins and padding, and prevents horizontal and vertical scrollbars.canvas
: Sets the background color of the canvas element to black.<div class="overlay">
element serves as an initial overlay, covering the entire screen.
<button>
within the overlay triggers the start of the spectrogram visualization when clicked.<canvas>
element is where the spectrogram will be drawn.<script>
tag contains JavaScript code to handle user interactions, audio processing, and the visualization.JavaScript Functionality
1. Event Listener and Overlay Removal:
document.querySelector(".overlay > button").addEventListener("click", (e) => { ... })
overlay.parentNode.removeChild(overlay);
startSpectrogram
function to initiate the spectrogram visualization.2. startSpectrogram
Function:
let canvas = document.querySelector('canvas');
let ctx = canvas.getContext('2d');
(Gets the 2D drawing context for the canvas)let WIDTH = +canvas.width;
and let HEIGHT = +canvas.height;
(Retrieve the canvas dimensions)let constraints = { audio: { ... } };
(Defines audio constraints for getUserMedia)resizeCanvas
to make the canvas responsive to window resizing. It updates the canvas width and height to match its container's size.resizeTimeout
variable is used to throttle resize events to avoid frequent re-rendering.shiftLeft
function shifts the canvas image data one pixel to the left, effectively moving the spectrogram visualization to the left to create a "scrolling" effect.let audioContext = new AudioContext();
(Creates an audio context, which handles audio processing)let analyser = audioContext.createAnalyser();
(Creates an audio analyser node to analyze frequency data)analyser.fftSize = 2048;
(Sets the size of the Fast Fourier Transform used for analysis)analyser.smoothingTimeConstant = 0.0;
(Sets the smoothing factor to 0, resulting in no smoothing for raw data)let buffer = new Uint8Array(analyser.frequencyBinCount);
(Creates a buffer to store frequency data)navigator.mediaDevices.getUserMedia(constraints)
(Requests microphone access from the user)stream
object, which represents the audio input.var microphone = audioContext.createMediaStreamSource(stream);
(Creates a media stream source from the stream)microphone.connect(analyser);
(Connects the microphone source to the analyser node)draw
function is responsible for visualizing the spectrogram:
shiftLeft
to shift the image data to the left.analyser.getByteFrequencyData(buffer)
retrieves the frequency data into the buffer
.dy
) based on the number of frequency bins.buffer
, using the frequency data to set the fill color of rectangular bars.requestAnimationFrame(draw)
schedules the draw
function to be called repeatedly for smooth animation.Overall, this code effectively creates a basic real-time audio spectrogram visualization using the Web Audio API and canvas drawing.
Improvements and Potential Features:
Great, thanks for sharing your finds! I'll let you know how my testing goes.
Great, thanks for sharing your finds! I'll let you know how my testing goes.
it works! you can add it...
only caveat: not all mime types are supported. and for code or text files it's better to put the code or text file in the message as it is than passing it as an inlined file.
This one he messed up.. but the answer is almost right:
Note: for files who don't have a known mime type or that are unaccepted, just use their ascii representation. If you pass them as application/* they will probably be refused. If you need I can send you the code of the "file analyzer". It's a static page.
I tested it last night, and it mostly gets errors and sometimes a file reading. I'll send an update with a selector to choose between gemini-1.5-flash and gemini-pro for you to test it, and check if there's a problem on the API call according to what you tested.
@0wwafa I've merged the changes, if the model is Flash you can select files.
There are a few bugs I can fix later this weekend like not clearing the files after sending the prompt but you should be good to play with it and give feedback or suggest fixes for processing files.
I tested it last night, and it mostly gets errors and sometimes a file reading. I'll send an update with a selector to choose between gemini-1.5-flash and gemini-pro for you to test it, and check if there's a problem on the API call according to what you tested.
The rest api is tricky. But I finished now the file analyzer which uses the rest api and it works beautifully. All supported file types (all audio types all video types and all document types) work! The only limit is that the payload can't exceed 20971520.
It's great to know that you got the analyzer working!
Let me know if you test the update I sent, if you want to include part of your analyzer to improve passing the base64 feel free to share the code or open a Pull Request
It's great to know that you got the analyzer working!
Let me know if you test the update I sent, if you want to include part of your analyzer to improve passing the base64 feel free to share the code or open a Pull Request
I will publish my code when it will be "decent" :D As of now it's working beautifully using the streaming api (which is a mess). It started as a proof of concept to show you how it could be done, now it's a standalone program of 600 hand written lines with only one library imported (markdown.js)
Please add file upload (text, images, pdf, etc)