RitaRamo / smallcap

SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
88 stars 17 forks source link

About Seq2SeqTrainer in train.py #5

Closed sungbinson closed 1 year ago

sungbinson commented 1 year ago

line 50 feature_extractor = CLIPFeatureExtractor.from_pretrained(args.encoder_name)

line 144 trainer = Seq2SeqTrainer( model=model, args=training_args, data_collator=default_data_collator, train_dataset=train_dataset, tokenizer=feature_extractor, )

toknizer parameter in Seq2SeqTrainer, you use a feature_extractor which is for image processing(CLIP), not text toknizer like BERT, GPT

can you explain why you use CLIPFeatureExtractor for tokenizer in Seq2SeqTrainer?

YovaKem commented 1 year ago

The Trainer expects the input tokenizer, which in the case of visual inputs is the feature extractor (a function that maps from an image into "tokens", in this case patches, that the encoder can process).