Add file upload to gemini. #14

Open 0wwafa opened 3 weeks ago

0wwafa commented 3 weeks ago

Please add file upload (text, images, pdf, etc)

fjosue4 commented 3 weeks ago

Hey @0wwafa I'll need to check if the API now accepts files that aren't part of the Google Account files, Gemini Vision Pro the last time I checked required to be signed in to Upload files which were not accepted just with the API required on this basic app.

If files can be passed without OAuth 2.0 I can add that feature.

I'll make sure to keep you posted.

0wwafa commented 3 weeks ago

myfile = genai.upload_file(media / "poem.txt")
file_name =
print(file_name)  # "files/*"

myfile = genai.get_file(file_name)


document = genai.upload_file(path=media / "a11.txt")
model_name = "gemini-1.5-flash-001"
cache = genai.caching.CachedContent.create(
    system_instruction="You are an expert analyzing transcripts.",

model = genai.GenerativeModel.from_cached_content(cache)
response = model.generate_content("Please summarize this transcript")
fjosue4 commented 3 weeks ago

@0wwafa I just tested it but no luck TypeError: (0 , import_fs.readFileSync) is not a function

Seems like it's not browser-based.

I'll be testing the upcoming days with this other documentation:

Update 2: This endpoint seems to exist${apiKey}

But not able yet to get a compatible way to send the files.


0wwafa commented 3 weeks ago
// Make sure to include these imports:
// import { GoogleAIFileManager } from "@google/generative-ai/server";
// import { GoogleGenerativeAI } from "@google/generative-ai";
const fileManager = new GoogleAIFileManager(process.env.API_KEY);

const uploadResult = await fileManager.uploadFile(
    mimeType: "image/jpeg",
    displayName: "Jetpack drawing",
// View the response.
  `Uploaded file ${uploadResult.file.displayName} as: ${uploadResult.file.uri}`,

const genAI = new GoogleGenerativeAI(process.env.API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
const result = await model.generateContent([
  "Tell me about this image.",
    fileData: {
      fileUri: uploadResult.file.uri,
      mimeType: uploadResult.file.mimeType,
fjosue4 commented 3 weeks ago

Nope @0wwafa we need to use the endpoint as the GoogleAIFileManager is not compatible with the browser.


0wwafa commented 3 weeks ago

check the rest api.

0wwafa commented 3 weeks ago

I tested it both in python both in nodejs both in shell with CURL and they all work.

0wwafa commented 3 weeks ago

this also works:

const url = `${apiKey}`;

fetch(url, {
    method: 'GET',
.then(response => response.json())
.then(data => {
.catch(error => {
    console.error('Error:', error);
0wwafa commented 3 weeks ago

and also this: const url = `${apiKey}`;

0wwafa commented 3 weeks ago

hmm I see the problem.. when doing a POST to upload the file it seems there is a problem:

No 'Access-Control-Allow-Origin' header is present on the requested resource.

but that can be managed from the back-end with a small nodejs or python program...

0wwafa commented 3 weeks ago

Yep.. it must be done in the back-end.. in nodejs:

// Make sure to include these imports:
// import { GoogleAIFileManager } from "@google/generative-ai/server";
// import { GoogleGenerativeAI } from "@google/generative-ai";
const fileManager = new GoogleAIFileManager(process.env.API_KEY);

const uploadResult = await fileManager.uploadFile(`${mediaPath}/a11.txt`, {
  mimeType: "text/plain",
  displayName: "Apollo 11",
// View the response.
  `Uploaded file ${uploadResult.file.displayName} as: ${uploadResult.file.uri}`,

const genAI = new GoogleGenerativeAI(process.env.API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
const result = await model.generateContent([
  "Transcribe the first few sentences of this document.",
    fileData: {
      fileUri: uploadResult.file.uri,
      mimeType: uploadResult.file.mimeType,
fjosue4 commented 3 weeks ago

Yes, it runs correctly on NodeJS, but this UI is browser-based with a pure frontend that's why it's not working directly and requires a similar endpoint as the one I passed you but files won't be stored because Google doesn't like it that way.${apiKey}

0wwafa commented 3 weeks ago

an alternative is the inlining:

          "inline_data": {
            "data": "'$(base64 $B64FLAGS a11.txt)'"

the are many mime types accepted including: pdf, png, text, mp3, wmv, mp4 etc

0wwafa commented 3 weeks ago

The real problem with the web api is that every time you prompt the model you are forced to send everything (all the history etc). I opened a bug report on geminiai about this. Even cached content does not work because they use the "fs" api, but perhaps there could be a workaround.

fjosue4 commented 3 weeks ago

Yes, that's a problem because all chats at least on this app are stored in LocalStorage to provide context to Gemini, even with this sometimes it reads the message and responds something incorrectly because it got lost reading all historical messages, so storing also files would be a huge memory problem we would be sending all files all the time, with text it's hard to get it full but with files using base64 will crash the app shortly.

0wwafa commented 3 weeks ago

memory? gemini flash has 1M token context! and the base64 inlining works. subsequently (chatting) the image can be removed leaving its answers on the image.. this works perfectly also on aistudio. please consider the inlining I posted above..

0wwafa commented 3 weeks ago

I just found out that it's even simpler!!!

"parts":[{"text": "BASE64DATA"}]

it automaticalyy analyze them!!

>       "contents": [{
>         "parts":[{"text": "ewogICJkZXBlbmRlbmNpZXMiOiB7CiAgICAiQGdvb2dsZS1haS9nZW5lcmF0aXZlbGFuZ3VhZ2UiOiAiXjIuNS4wIiwKICAgICJAZ29vZ2xlL2dlbmVyYXRpdmUtYWkiOiAiXjAuMTEuMyIsCiAgICAiY3J5cHRvLWpzIjogIl40LjIuMCIsCiAgICAid3MiOiAiXjguMTcuMCIKICB9Cn0K"},{"text": "Analyze this."}]
>         }]
>        }' 2> /dev/null
  "candidates": [
      "content": {
        "parts": [
            "text": "This is a JSON object representing dependencies for a project.  Here's a breakdown:\n\n**Structure**\n\n* **\"dependencies\"**: This is the main key that holds all the dependencies.\n* **\"@[dependency name]\"**: Each key within \"dependencies\" represents a specific dependency with its name and version number.\n\n**Dependencies**\n\n* **\"@google-ai/generative-language\"**: This dependency is for a generative language library from Google AI. Its version is \"v2.5.0\".\n* **\"@google/generative-ai\"**: Another library from Google, likely for generative AI tasks. Its version is \"v0.11.3\".\n* **\"crypto-js\"**: A library for working with cryptographic functions. Its version is \"v4.2.0\".\n* **\"ws\"**: This is a library for working with websockets. Its version is \"v8.17.0\".\n\n**Meaning**\n\nThis JSON snippet likely comes from a project's `package.json` file. It defines the software libraries that the project relies on. When installing this project, a package manager (like npm or yarn) will automatically fetch and install these dependencies and their specified versions, ensuring that the project has all the necessary components to run correctly.\n\n**Key points to remember:**\n\n* Dependency management is crucial in software development to ensure consistency and avoid conflicts.\n* Using specific versions (like \"v2.5.0\") is important for maintaining compatibility and preventing unexpected behavior.\n* `package.json` is a standard file used to define project metadata, including dependencies, for Node.js and JavaScript projects. \n"
fjosue4 commented 3 weeks ago

Awesome @0wwafa I'll test passing the base64 inside the content this weekend, I'll let you know how it goes!

About memory, I was talking on the user side (browser) keeping the historical there.

0wwafa commented 3 weeks ago

Awesome @0wwafa I'll test passing the base64 inside the content this weekend, I'll let you know how it goes!

after a few more tests (I passed an image) It didn't work well. I don't know how it really wiorks on the back-end. anyway there are multiple ways to upload files. another simple one is: upload a file to google drive. enable sharing to "whoever has the link" and then add the ID of the file in the conversation.

that is the method aistudio uses. but in aistudio you can also paste an image in the chatbox... the image is passed to the model in base64. I'll get back to you when I find a solid way to do it from a webapp.

0wwafa commented 3 weeks ago

here is how to do it:

    async function fileToGenerativePart(file) {
        const base64EncodedDataPromise = new Promise((resolve) => {
            const reader = new FileReader();
            reader.onloadend = () => resolve(reader.result.split(',')[1]);
        return {
            inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
0wwafa commented 3 weeks ago

tested and working:

        const data = JSON.stringify({
            contents: [{
                parts: [{
                        inlineData: {
                            mimeType: mimeType,
                            data: fileContent.toString('base64')
                        text: 'Analyze this.'
            }, ],
0wwafa commented 3 weeks ago

$ node anal2.js woman_art1.jpg

The painting depicts a woman sitting at a cafe table, her gaze directed downwards, creating a sense of introspection. She is dressed in a vibrant red dress, accentuated by a white top, suggesting a sense of sophistication and femininity. The dress, with its flowing lines, adds a graceful touch to the composition. Her long, flowing hair cascades down her back, framing her face and drawing attention to her features.

The painting depicts a woman sitting at a cafe table, her gaze directed downwards, creating a sense of introspection. She is dressed in a vibrant red dress, accentuated by a white top, suggesting a sense of sophistication and femininity. The dress, with its flowing lines, adds a graceful touch to the composition. Her long, flowing hair cascades down her back, framing her face and drawing attention to her features.

The setting is a Parisian cafe, a quintessential location synonymous with romance and art. The background, although blurred, provides a glimpse into the bustling cafe scene. The cafe tables and chairs, along with the figures of other patrons in the distance, contribute to the lively atmosphere of the scene.

The use of light and shadow adds depth and dimension to the painting. The sun casts a warm glow on the woman, highlighting her features and creating a sense of warmth. The shadows, meanwhile, accentuate the lines of the cafe and the figures of the patrons, creating a sense of depth and realism.

The overall mood of the painting is one of contemplation and tranquility. The woman's pensive expression, combined with the relaxed setting of the cafe, creates a sense of peacefulness. The warm colors and the use of light and shadow further enhance this sense of tranquility, inviting the viewer to step into the painting and experience the moment.

The painting is a beautiful representation of a timeless scene, capturing the essence of Parisian cafe culture. It is a testament to the artist's skill in depicting the human figure and creating a sense of realism and beauty. The painting is sure to captivate viewers with its evocative imagery and its ability to transport them to a different time and place.
0wwafa commented 3 weeks ago

the only restriction is that the payload can be 20971520 bytes maximum.

0wwafa commented 3 weeks ago

it seems to work with many more file types than the ones publicized :D

$ node anal2.js ../spectrogram.html This HTML code creates a web page that visualizes audio input from the user's microphone in the form of a spectrogram. Here's a breakdown of the code and its functionality:

HTML Structure

JavaScript Functionality

1. Event Listener and Overlay Removal:

2. startSpectrogram Function:

Overall, this code effectively creates a basic real-time audio spectrogram visualization using the Web Audio API and canvas drawing.

Improvements and Potential Features:

fjosue4 commented 3 weeks ago

Great, thanks for sharing your finds! I'll let you know how my testing goes.

0wwafa commented 2 weeks ago

Great, thanks for sharing your finds! I'll let you know how my testing goes.

it works! you can add it...

only caveat: not all mime types are supported. and for code or text files it's better to put the code or text file in the message as it is than passing it as an inlined file.

0wwafa commented 2 weeks ago

0wwafa commented 2 weeks ago

Note: for files who don't have a known mime type or that are unaccepted, just use their ascii representation. If you pass them as application/* they will probably be refused. If you need I can send you the code of the "file analyzer". It's a static page.

fjosue4 commented 2 weeks ago

I tested it last night, and it mostly gets errors and sometimes a file reading. I'll send an update with a selector to choose between gemini-1.5-flash and gemini-pro for you to test it, and check if there's a problem on the API call according to what you tested.

fjosue4 commented 2 weeks ago

@0wwafa I've merged the changes, if the model is Flash you can select files.

There are a few bugs I can fix later this weekend like not clearing the files after sending the prompt but you should be good to play with it and give feedback or suggest fixes for processing files.

0wwafa commented 2 weeks ago

I tested it last night, and it mostly gets errors and sometimes a file reading. I'll send an update with a selector to choose between gemini-1.5-flash and gemini-pro for you to test it, and check if there's a problem on the API call according to what you tested.

The rest api is tricky. But I finished now the file analyzer which uses the rest api and it works beautifully. All supported file types (all audio types all video types and all document types) work! The only limit is that the payload can't exceed 20971520.

fjosue4 commented 2 weeks ago

It's great to know that you got the analyzer working!

Let me know if you test the update I sent, if you want to include part of your analyzer to improve passing the base64 feel free to share the code or open a Pull Request

0wwafa commented 2 weeks ago

It's great to know that you got the analyzer working!

Let me know if you test the update I sent, if you want to include part of your analyzer to improve passing the base64 feel free to share the code or open a Pull Request

I will publish my code when it will be "decent" :D As of now it's working beautifully using the streaming api (which is a mess). It started as a proof of concept to show you how it could be done, now it's a standalone program of 600 hand written lines with only one library imported (markdown.js)