:camera: :question: Görsel Soru Cevaplama / Visual Question Answering

Görsel Soru Cevaplama sahip olduğumuz bir resim ile ilgili sorulan sorulara, resim içerisindeki bilgilerin analiz edilmesi ile cevaplar üretilmeye çalışılması problemi olarak tanımlanabilir.

Bu problemde metinler şeklinde ifade edilen soruların işlenmesi bir Doğal Dil İşleme problemi iken; resimler içerisinden cevapların üretiminde her bir soru ayrı bir Bilgisayarla Görü problemine işaret eder.

Visual Question Answering can be defined as the problem of trying to produce answers by analyzing the information in the picture.

In this problem, the questions expressed in the form of texts are a Natural Language Processing problem; each question in the production of answers within the pictures indicates a separate Computer Vision problem.

Genel olarak sisteme bakacak olursak:

If we look at the system in general:

alt text

Görsel soru cevaplama problemi için geliştirilen modellerin genel yaklaşımı :

The general approach of the models developed for the visual questioning problem is:

alt text

:pushpin: Görsel Soru Cevaplama görevi için geliştirilen ve literatürdeki çalışmalarda sıklıkla kullanılan veri kümeleri

Data sets developed for the Visual Question Answering task and frequently used in studies in the literature

Blog 📝

Görsel soru cevaplama ile ilgili anlatım ve bu görev için sıklıkla kullanılan veri kümeleri hakkında detaylı bilgi edinmek için "Çok Gören Mi Bilir, Çok Soran Mı?" başlıklı blog yazıma göz atabilirsiniz.

You can browse my blog titled "Çok Gören Mi Bilir, Çok Soran Mı?" to get detailed information about the Visual Question Answering and the datasets frequently used for this task.

Uygulama (Implementation) :hammer:

Gereksinimler (Requirements):

Tensorflow (Ver. 1.2+)
Keras (Ver. 2.0+)
scikit-learn
Spacy (Ver 2.0+)
- Glove vektörlerini yüklemek için kullanılır (word2vec) / Used to load Glove vectors (word2vec)
- Glove vektörlerini yükseltmek ve yüklemek için / To upgrade & install Glove Vectors
  - python -m spacy download en_vectors_web_lg
OpenCV

Demo 🖥️

Bu jupyter notebook çalışma dosyası, verilen görüntü hakkında sorulan soruyu cevaplamak için önceden hazırlanmış modelleri kullanan basit bir Görsel Soru Cevaplama demosudur.

This jupyter notebook is a simple Visual Question answering demo that uses pretrained models to answer a given question about the given image.

API :computer:

Geliştirdiğiniz ürünlere hızlı bir şekilde entegre edebileceğiniz API'yi kullanmak için buraya tıklayın.

Click here to use the API that you can integrate quickly into the products you have developed.

API Python Uygulama / API Python Implementation

Kurulum / Install

Algorithmia Python istemcisini pip ile yükleyin / Install the Algorithmia Python client with pip:

pip install algorithmia

Kullanım / Use

import Algorithmia

input = {
  "image": "data://yavuzkomecoglu/DL_VQA/test.jpg",
  "question": "What vehicle is in the picture?"
}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('yavuzkomecoglu/VQA/0.1.1')
print(algo.pipe(input).result)

Örnek Tahminler / Sample predictions

VQA modeli tarafından tahmin edilen bazı cevaplar.

Some answers predicted by the VQA model.

Q: How is the weather? A: Sunny! (%97.23)

Q: How many girls are in the picture? A: 2! (%61.98)

Q: What is done in the picture? A: Surfing! (%99.43)

Q: What does the sign say? A: Stop! (%28.61)

Referanslar / References

Aaditya Prakash (Adi) - Blog

basakbuluz / Visual-Question-Answering

readme