Open guoxiaolu opened 2 years ago
Hi, @guoxiaolu Did you fix the problem?
Hi, @guoxiaolu Did you fix the problem?
no...
Hi,
The error you are getting is fixed, it will be included in the next release (which comes out today).
@NielsRogge I think it still have a problem that the processor output's size don't match each others.
Also, the encoded_inputs still doesn't have the token_type_ids key.
if it is fixed, am I need to modify something the Custom Dataset?
I'm using below versions.
transformers -4.18.0.dev0
processor = LayoutXLMProcessor.from_pretrained("microsoft/layoutxlm-base", apply_ocr=False)
model = LayoutLMv2Model.from_pretrained("microsoft/layoutxlm-base", num_labels=len(labels))
I didn't define any layoutxlmTokenizer, Feature Extractor.
p.s. +) Did you(@guoxiaolu) fix it?? If so, Could you please tell me the way?
I have tested it using layoutxlm on sroie, however, each sample encoded_inputs size is different, like 176, 348, and this input doesn't have "token_type_ids" key. This makes model training failed. Besides, model = LayoutLMv2ForTokenClassification.from_pretrained('microsoft/layoutxlm-base', num_labels=len(labels)).This leads to the error: File "/home/guoxiaolu/.local/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1489, in from_pretrained model = cls(config, *model_args, **model_kwargs) TypeError: init() got an unexpected keyword argument '_configuration_file If 'microsoft/layoutlmv2-base-uncased' is loaded, it is correct. ` class SROIEDataset(Dataset): """SROIE dataset."""
def main(): train_file = xxx test_file = xxx train, train_flist = file_deserialize(train_file) test, test_flist = file_deserialize(test_file)