SidxA / bureau

Apache License 2.0
0 stars 0 forks source link

model finetuning #4

Closed SidxA closed 6 months ago

SidxA commented 11 months ago

https://www.reddit.com/r/LocalLLaMA/comments/185g4p6/tool_to_quickly_iterate_when_finetuning/

SidxA commented 11 months ago

FUNSD dataset https://huggingface.co/datasets/nielsr/funsd-layoutlmv3 200 examples, and some more, unreliable versions of the dataset it has some labelling structure aswell as bbox coordinates formnet annotations https://huggingface.co/datasets/H2KP/funsd-formnet (200 forms) more https://huggingface.co/datasets/hcw-00/cdip-annotations-formnet-v4 some more funsd extracted info in text form https://huggingface.co/datasets/Dmkond/tune-forms?row=2

some (medical) box ticking image data set https://huggingface.co/datasets/saurabh1896/OMR-forms university forms plain text siganture receive qa https://huggingface.co/datasets/Lostkyd/pdf_forms?row=32

post ocr receipt datatset https://huggingface.co/datasets/nehruperumalla/forms

SidxA commented 11 months ago

the hierarchical semantic segmentation cnn dataset https://ar5iv.labs.arxiv.org/html/1911.12170 https://paperswithcode.com/dataset/forms-dataset

SidxA commented 11 months ago

form nlu dataset https://github.com/adlnlp/form_nlu

SidxA commented 11 months ago

there is also the tobacco https://paperswithcode.com/dataset/tobacco-3482 and the https://paperswithcode.com/dataset/rvl-cdip which is just image classification

SidxA commented 10 months ago

https://www.reddit.com/r/LocalLLaMA/comments/188197j/80_faster_50_less_memory_0_accuracy_loss_llama/