Closed sayakpaul closed 2 years ago
Hi @sayakpaul, this is great, thank you for your contribution. I agree the community can benefit from it. The code is clear and the Keras blog post is well documented. Happy to; I linked those on the https://multimodal-entailment.github.io/ website and on this repository's README.
Right, Arjun presented a demo in the tutorial, the colab is available here: https://colab.research.google.com/github/tensorflow/neural-structured-learning/blob/master/workshops/kdd_2020/graph_regularization_pheme_natural_graph.ipynb
Open to ideas and suggestions too
Thank you @cesar-ilharco!
That's an amazing tutorial especially because it explicitly shows how to create neighbors in a format that is compatible with NSL's graph regularization.
Open to ideas and suggestions too
I was thinking along the lines of using both the modalities as introduced in the multimodal dataset and incorporate graph regularization to build an entailment model. Maybe the NSL team has something planned. This is why I tagged Arjun in my previous comment.
@cesar-ilharco hi. I am Sayak, an ML Engineer from India.
Firstly, thanks to you and the entire team for putting together such a comprehensive tutorial. I had the chance to go through the deck in detail last week and I really liked the materials presented in it.
To this end, I have been working on building some baseline models that may go well with the dataset from the past week. As a result, this blog post came out. The accompanying repository is here: https://github.com/sayakpaul/Multimodal-Entailment-Baseline.
The baseline model is simple. Encode the images with a pre-trained ResNet50V2 and encode the text inputs with a pre-trained BERT (base). After extracting the encodings, project them in a unified space and finally pass those projections through a classification layer for predicting entailment/no-entailment/contradiction. In code, it looks like so (full snippet can be found inside the blog post):
Along with these I also go over the following points:
I hope the community would benefit from these things and it will serve as a simple baseline to foster research in the area. Do you think it makes sense to give all of these a mention from the tutorial website and this repository? I totally understand if it's not.
On a related note, I also want to use the implicit similarity signals of the examples to further regularize the training. This is doable in Neural Structured Learning and I believe @arjung has already presented a demo in the tutorial. I believe it'd be great to collaborate for that to work on a tutorial that the community could readily use. So, open to ideas here :)