We are following the Fine_tune_UDOP_on_a_customdataset(toy_RVL_CDIP_dataset).ipynb notebook example.
We used OCR text and coordinates based on CJK (Chinese, Japanese, Korean).
However, it seems that UDOPTokenizer does not support CJK.
Can you provide a guide or notebook code to change to the LayoutXLMTokenizer instead of the UDOPTokenizer?
Hi @NielsRogge
We are following the Fine_tune_UDOP_on_a_customdataset(toy_RVL_CDIP_dataset).ipynb notebook example. We used OCR text and coordinates based on CJK (Chinese, Japanese, Korean). However, it seems that UDOPTokenizer does not support CJK. Can you provide a guide or notebook code to change to the LayoutXLMTokenizer instead of the UDOPTokenizer?