dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Apache License 2.0
1.36k stars 209 forks source link

Initial commit for 2024 VizWiz challenge #93

Closed harrychien1311 closed 5 months ago

harrychien1311 commented 5 months ago
  1. Create a new dataset class VQA
  2. Create a new data module class VQA_datamodule
  3. Adding vqa_processing.py script to process vqa data