facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents
https://facebookresearch.github.io/nougat/
MIT License
8.98k stars 567 forks source link

Creating my own dataset and training it using Trainer from transformers #202

Open Piyush-Musaddi opened 9 months ago

Piyush-Musaddi commented 9 months ago

I want to train facebook/nougat-base model using my custom dataset that consists of images and their corresponding text, how can I create a dataset through this so that I can train the model further using my own dataset.

I tried to do it using the given method too, but the json file created lacks a lot of keys, due to which I'm getting an empty index.jsonl generated.

AhmadHakami commented 8 months ago

hi @lukas-blecher and thank you for your efforts and for answering our questions in the issues section this is an important question and we would appreciate it if you could guide us

I want to train facebook/nougat-base model using my custom dataset that consists of images and their corresponding text, how can I create a dataset through this so that I can train the model further using my own dataset.

I tried to do it using the given method too, but the json file created lacks a lot of keys, due to which I'm getting an empty index.jsonl generated.