Open KnitVikas opened 2 years ago
You can refer huggingface documentation for creating a pre-processing file for preparing your dataset for lmv2. I referred the same for my own custom dataset. Below is the FUNSD-preprocessing file that i refrenced for preprocessing custom dataset. https://huggingface.co/datasets/nielsr/funsd/blob/main/funsd.py
You can check that out , this works for LMV2. Hope it helps.
@sheikhasim Thanks for your reply. But https://huggingface.co/datasets/nielsr/funsd/blob/main/funsd.py is downloading the funsd dataset and using that but my data is present in local how to load it using load_dataset of huggingface . what are the changes required ?
@KnitVikas If your dataset is already in required input format i.e. images + json then , There are two ways for that :
Store the zip file of your dataset in gdrive and create a pre-processing script in the hugging face datasets. Mention the path to the gdrive dataset.zip in download and extract command. So once you run load_dataset(name_of_hugging_face_script) It'll load dataset from the gdrive and pre-process.
Second way is to use the hugging face's method of loading the dataset locally. ........ from datasets import load_dataset dataset = load_dataset('PATH/TO/MY/LOADING/SCRIPT', data_files='PATH/TO/MY/FILE') ......... https://huggingface.co/docs/datasets/v1.11.0/loading_datasets.html
@NielsRogge thanks for this layoutlmv2 implemetation in HF.Actually i want to create the torch dataset from my custom images and json file (for now suppose data is FUNSD downloaded) please guide me how can i create this torch dataset so that i can give this data as input to LayoutLMv2Processor and apply map function.
This is what is tried
By creating data this way i was getting while training like : TypeError: LayoutLMv2ForTokenClassification object argument after ** must be a mapping, not list
please help me out. thanks for solution in advance.``