fg91 / DeViSE-zero-shot-classification

DeViSE model (zero-shot learning) trained on ImageNet and deployed on AWS using Docker
47 stars 13 forks source link

Latest Paper related to DeViSE #3

Open UmarMajeed-Rana opened 5 years ago

UmarMajeed-Rana commented 5 years ago

Hi Fabio. I read your article on Medium. Due to some reasons I am not able to post response. I enjoyed reading your explanation of Paper. Can you point recent advancements in this space? I see this Paper was published in 2013 but it still looks relevant. Simple and Powerful.

fg91 commented 5 years ago

Hi Umar,

thank you for your interest! If you are interested in these kinds of models I suggest you next look at image caption generation: https://cs.stanford.edu/people/karpathy/cvpr2015.pdf The authors "describe neural networks that map words and image regions into a common, multimodal embedding.", which shares some similarities with DeViSE. Better approaches are, however, presented in articles like these (1,2). I'm working on a detailed image caption generation tutorial that should be on Medium within the next few weeks. Best regards, Fabio

UmarMajeed-Rana commented 5 years ago

Thank you for your response Fabio. Do show and tell and other article you mentioned they also take image and text into same common space? I am interested in doing reverse image search on fashion domain. Searching images from corpus by writing the description.

On Wed, 19 Jun 2019, 2:39 PM Fabio M. Graetz notifications@github.com wrote:

Hi Umar,

thank you for your interest! If you are interested in these kinds of models I suggest you next look at image caption generation: https://cs.stanford.edu/people/karpathy/cvpr2015.pdf The authors "describe neural networks that map words and image regions into a common, multimodal embedding.", which shares some similarities with DeViSE. Better approaches are, however, presented in articles like these (1 https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Vinyals_Show_and_Tell_2015_CVPR_paper.pdf ,2 https://arxiv.org/pdf/1502.03044.pdf). I'm working on a detailed image caption generation tutorial that should be on Medium within the next few weeks. Best regards, Fabio

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fg91/DeViSE-zero-shot-classification/issues/3?email_source=notifications&email_token=AEUTPOGLGEMRPM6A5QQANYLP3H5G5A5CNFSM4HZEJIJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYBJUXI#issuecomment-503487069, or mute the thread https://github.com/notifications/unsubscribe-auth/AEUTPOFRYI6TQCFKZWTCFC3P3H5G5ANCNFSM4HZEJIJA .

fg91 commented 5 years ago

No, the models proposed in the "Show and Tell" and "Show, Attend and Tell" papers directly generate captions using an RNN decoder taking the representations generated by a CNN encoder as input.

However, the last paragraph in the left column in the section "Related Work" in the "Show and Tell" paper (page 2) talks about previous articles doing exactly what you intend to do as far as I understand. Maybe check the paragraph out (the one starting with "A large body of work has addressed the problem of ranking descriptions for a given image..."). Is this what you were looking for?

Best regards,

Fabio

UmarMajeed-Rana commented 5 years ago

Hi Fabio

Exactly I am looking some thing similar. I am looking for some latest paper in this domain so that I can extend them with focus on fashion domain. I saw many attention based models for image captioning but they are not in common embedding space with text. At Inference Time I will just have text to find the relevant/related image.

On Fri, Jun 21, 2019 at 3:15 PM Fabio M. Graetz notifications@github.com wrote:

No, the models proposed in the "Show and Tell" and "Show, Attend and Tell" papers directly generate captions using an RNN decoder taking the representations generated by a CNN encoder as input.

However, the last paragraph in the left column in the section "Related Work" in the "Show and Tell" paper (page 2) talks about previous articles doing exactly what you intend to do as far as I understand. Maybe check the paragraph out (the one starting with "A large body of work has addressed the problem of ranking descriptions for a given image..."). Is this what you were looking for?

Best regards,

Fabio

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fg91/DeViSE-zero-shot-classification/issues/3?email_source=notifications&email_token=AEUTPODI2MW23GYQ6I6CFDDP3SS43A5CNFSM4HZEJIJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYICECQ#issuecomment-504373770, or mute the thread https://github.com/notifications/unsubscribe-auth/AEUTPOHAYY7ZYMIBELXYNPLP3SS43ANCNFSM4HZEJIJA .