Muennighoff / vilio

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle
MIT License
88 stars 29 forks source link
ernie-vil hateful-memes lxmert oscar transformers uniter vision-and-language vision-transformer visualbert


Build GitHub release Transformers Documentation Contributor Covenant

State-of-the-art Visio-Linguistic Models 🥶

## Updates ### 06/2021 - Hateful Memes CSV Files - The CSV files that were used for the scores in the vilio paper are now available here ### 06/2021 - Inference on any meme - Thanks to the initiative by katrinc, here are two notebooks for using Vilio to perform pure inference on any meme you want :) - Just adapt the example input dataset / input model to use a different meme / pretrained model🥶 - GPU: - CPU: ## Ordering Vilio aims to replicate the organization of huggingface's transformer repo at: - /bash Shell files to reproduce hateful memes results - /data By default, directory for loading in data & saving checkpoints - /ernie-vil Ernie-vil sub-repository written in PaddlePaddle - /fts_lmdb Scripts for handling .lmdb extracted features - /fts_tsv Scripts for handling .tsv extracted features - /notebooks Jupyter Notebooks for demonstration & reproducibility - /py-bottm-up-attention Sub-repository for tsv feature extraction forked & adapted from [here]( - src/vilio All implemented models (also see below for a quick overview of models) - /utils Pandas & ensembling scripts for data handling - files Scripts used to access the models and apply model-specific data preparation - files Same purpose as entry files, but for pre-training; Point of entry for pre-training - Training code for the hateful memes challenge; Main point of entry - Args for running ## Usage Follow for reproducing performance on the Hateful Memes Task.
Follow for using the framework for your own task.
See the paper at: ## Architectures 🥶 Vilio currently provides the following architectures with the outlined language transformers: 1. **[E - ERNIE-VIL](** [ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph]( - [ERNIE: Enhanced Language Representation with Informative Entities]( 1. **[D - DeVLBERT](** [DeVLBert: Learning Deconfounded Visio-Linguistic Representations]( - [BERT: Bidirectional Transformers]( 1. **[O - OSCAR](** [Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks]( - [BERT: Bidirectional Transformers]( 1. **[U - UNITER](** [UNITER: UNiversal Image-TExt Representation Learning]( - [BERT: Bidirectional Transformers]( - [RoBERTa: Robustly Optimized BERT Pretraining Approach]( 1. **[V - VisualBERT](** [VisualBERT: A Simple and Performant Baseline for Vision and Language]( - [ALBERT: A Lite BERT]( - [BERT: Bidirectional Transformers]( - [RoBERTa: Robustly Optimized BERT Pretraining Approach]( 1. **[X - LXMERT](** [LXMERT: Learning Cross-Modality Encoder Representations from Transformers]( - [ALBERT: A Lite BERT]( - [BERT: Bidirectional Transformers]( - [RoBERTa: Robustly Optimized BERT Pretraining Approach]( ## To-do's - [ ] Clean-up import statements, python paths & find a better way to integrate transformers (Right now all import statements only work if in main folder) - [ ] Enable loading and running models just via import statements (and not having to clone the repo) - [ ] Find a way to better include ERNIE-VIL in this repo (PaddlePaddle to Torch?) - [ ] Move tokenization in entry files to model-specific tokenization similar to transformers ## Attributions The code heavily borrows from the following repositories, thanks for their great work: - - - ## Citation ```bibtex @article{muennighoff2020vilio, title={Vilio: State-of-the-art visio-linguistic models applied to hateful memes}, author={Muennighoff, Niklas}, journal={arXiv preprint arXiv:2012.07788}, year={2020} } ```