NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.54k stars 1.45k forks source link

Img2Txt model fine-tuning with huge captions #450

Open AeroDEmi opened 4 months ago

AeroDEmi commented 4 months ago

I want to create a model that takes a screenshot of a front page and answers with the HTML code and JS. As you can tell the "input_ids" will be super long > 4096 tokens.

I was thinking of training a Blip2 model, but how can I efficiently train a model like this? Thanks