Closed sungbinson closed 1 year ago
line 50 feature_extractor = CLIPFeatureExtractor.from_pretrained(args.encoder_name)
line 144 trainer = Seq2SeqTrainer( model=model, args=training_args, data_collator=default_data_collator, train_dataset=train_dataset, tokenizer=feature_extractor, )
toknizer parameter in Seq2SeqTrainer, you use a feature_extractor which is for image processing(CLIP), not text toknizer like BERT, GPT
can you explain why you use CLIPFeatureExtractor for tokenizer in Seq2SeqTrainer?
The Trainer expects the input tokenizer, which in the case of visual inputs is the feature extractor (a function that maps from an image into "tokens", in this case patches, that the encoder can process).
line 50 feature_extractor = CLIPFeatureExtractor.from_pretrained(args.encoder_name)
line 144 trainer = Seq2SeqTrainer( model=model, args=training_args, data_collator=default_data_collator, train_dataset=train_dataset, tokenizer=feature_extractor, )
toknizer parameter in Seq2SeqTrainer, you use a feature_extractor which is for image processing(CLIP), not text toknizer like BERT, GPT
can you explain why you use CLIPFeatureExtractor for tokenizer in Seq2SeqTrainer?