The Recognizing Multimodal Entailment tutorial was held virtually at ACL-IJCNLP 2021 on August 1st.
It gives an overview of multimodal learning, introduces a multimodal entailment dataset, and encourages future research in the topic. For more information, https://multimodal-entailment.github.io/
A baseline model authored by Sayak Paul for this dataset is available on Keras.io, with the accompanying repository.
Example of multimodal entailment where texts or images alone would not suffice for semantic understanding or pairwise classifications.