The version of transformers I am using is 4.30.0.dev0, which is downloaded by default with: !pip install -q git+https://github.com/huggingface/transformers.git
The problem is that after performing several fine tunings with 200 (dataset: "SotiriosKastanas/difffunsd") and 1000 (dataset: "arvisioncode/donut-funsd") epochs, I have seen that the inference of these models always gives the same result, regardless of the input image we use:
Test 1:
<s_DATE:> 8/ 13/ 93</s_DATE:><s_MANUFACTURER:> AMERICAN TOBACCO COMPANY</s_MANUFACTURER:><s_cc:> R. D. Hammer</s_cc:><s_REPORTED BY:> A. REID, DIVISION MANAGER, SAN FRANCISCO, CA</s_REPORTED BY:><s_BRAND NAME:> SPECIAL 10 s</s_BRAND NAME:><s_OTHER INFORMATION:> SEE ATTACHED COPY OF CIRCULAR NO. 4848</s_OTHER INFORMATION:>
Test 2:
<s_DATE:> 8/ 13/ 93</s_DATE:><s_MANUFACTURER:> AMERICAN TOBACCO COMPANY</s_MANUFACTURER:><s_cc:> R. D. Hammer</s_cc:><s_REPORTED BY:> A. REID, DIVISION MANAGER, SAN FRANCISCO, CA</s_REPORTED BY:><s_BRAND NAME:> SPECIAL 10 s</s_BRAND NAME:><s_OTHER INFORMATION:> SEE ATTACHED COPY OF CIRCULAR NO. 4848</s_OTHER INFORMATION:>
And in fact, this output does not correspond to either of the two images, since this data is not found in them. That is why I think the model has been overtrained in a single example.
Do you know what could be happening and how I can fix it?
Hi
I have followed your tutorial at https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Pix2Struct to fine tune the pix2struct from the base model
"google/pix2struct-base"
, just change the dataset to one based in funsd:"arvisioncode/donut-funsd"
and"SotiriosKastanas/difffunsd"
.The version of transformers I am using is
4.30.0.dev0
, which is downloaded by default with:!pip install -q git+https://github.com/huggingface/transformers.git
The problem is that after performing several fine tunings with 200 (dataset:
"SotiriosKastanas/difffunsd"
) and 1000 (dataset:"arvisioncode/donut-funsd"
) epochs, I have seen that the inference of these models always gives the same result, regardless of the input image we use:Test 1:
<s_DATE:> 8/ 13/ 93</s_DATE:><s_MANUFACTURER:> AMERICAN TOBACCO COMPANY</s_MANUFACTURER:><s_cc:> R. D. Hammer</s_cc:><s_REPORTED BY:> A. REID, DIVISION MANAGER, SAN FRANCISCO, CA</s_REPORTED BY:><s_BRAND NAME:> SPECIAL 10 s</s_BRAND NAME:><s_OTHER INFORMATION:> SEE ATTACHED COPY OF CIRCULAR NO. 4848</s_OTHER INFORMATION:>
Test 2:<s_DATE:> 8/ 13/ 93</s_DATE:><s_MANUFACTURER:> AMERICAN TOBACCO COMPANY</s_MANUFACTURER:><s_cc:> R. D. Hammer</s_cc:><s_REPORTED BY:> A. REID, DIVISION MANAGER, SAN FRANCISCO, CA</s_REPORTED BY:><s_BRAND NAME:> SPECIAL 10 s</s_BRAND NAME:><s_OTHER INFORMATION:> SEE ATTACHED COPY OF CIRCULAR NO. 4848</s_OTHER INFORMATION:>
And in fact, this output does not correspond to either of the two images, since this data is not found in them. That is why I think the model has been overtrained in a single example.
Do you know what could be happening and how I can fix it?