Can he implement a similar function, such as uploading a document to a knowledge base containing an image. Then the user uploads an image, which can retrieve the image and know its location, such as indoor navigation, images of each room, and can upload one of the images for path planning and navigation

PromtEngineer / localGPT-Vision

Chat with your documents using Vision Language Models. This repo implements an End to End RAG pipeline with both local and proprietary VLMs

334 stars 65 forks source link

Can he implement a similar function, such as uploading a document to a knowledge base containing an image. Then the user uploads an image, which can retrieve the image and know its location, such as indoor navigation, images of each room, and can upload one of the images for path planning and navigation #5

Open libai-lab opened 1 week ago

libai-lab commented 1 week ago

Can he implement a similar function, such as uploading a document to a knowledge base containing an image. Then the user uploads an image, which can retrieve the image and know its location, such as indoor navigation, images of each room, and can upload one of the images for path planning and navigation

deadsunrise commented 1 week ago

WTF?

libai-lab commented 1 week ago

WTF?

Simply put, it is to recognize images for indoor space navigation