NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.17k stars 1.42k forks source link

processor.tokenizer VS processor.feature_extractor,in [TrOCR] model #275

Closed Mohammed20201991 closed 1 year ago

Mohammed20201991 commented 1 year ago

I have a doubt about the init. trainer from @NielsRogge Transformers-Tutorials (TrOCR model)
Which one could be the correct value for passing to the tokenizer? processor.tokenizer VS processor.feature_extractor ,

from transformers import default_data_collator

# instantiate trainer
trainer = Seq2SeqTrainer(
    model=model,
    tokenizer=processor.feature_extractor,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    data_collator=default_data_collator,
)
trainer.train()
Mohammed20201991 commented 1 year ago

processor.featuer_extractor

CYaiche commented 10 months ago

Ok but why ? The name of the parameter is tokenizer and you give the feature_extractor. This does not makes sens, does it ?

UoFallujah commented 10 months ago

@CYaiche Yea, you are right, that is why I am asking I think it was in old used version, processor.feature_extractor and used in the mentioned tutorial