PiotrNawrot / nanoT5

Fast & Simple repository for pre-training and fine-tuning T5-style models
Apache License 2.0
970 stars 74 forks source link

Just a quick question to pretrain Flan-T5 #35

Closed hohoCode closed 6 months ago

hohoCode commented 6 months ago

First of all, great work! Thanks for sharing.

Just wondering if this is possible to train Flan-T5 from scratch, any thoughts or ideas on this?

Thanks!

IdeaKing commented 6 months ago

Flan-T5 is a finetuned version of T5 -> so pretraining it wouldn't make sense since you are essentially removing the benefits of Flan, correct?

PiotrNawrot commented 6 months ago

This is correct @IdeaKing , thanks for answering!

hohoCode commented 6 months ago

I mean, to pretrain and get Flan-t5? Sure you have t5, then if this is possible to finetune it to get Flan-t5 that would be awesome.

PiotrNawrot commented 6 months ago

Yeah, this is what the repo does too. What I did and tested is fine tuning on Natural Instructions which is a subset of Flan, but instead you can download the entire Flan from HF and fine tune it using the config used for Natural Instructions. I would say that it should roughly work straight away. If it doesn’t then you should look up the fine tuning hyperparameters from the Flan paper.

hohoCode commented 6 months ago

Thanks!