Here is a simple example of applying ERNIE on the VQA task.
This is a complete pipeline of fine-tuning on DocVQA that I strongly recommend you read. You can simply replace the tokenizer and model with ERNIE I provided and try to train a model for your own task.
Are there any examples of how we can fine tune with the DocVQA dataset?