bigscience-workshop / xmtf

Crosslingual Generalization through Multitask Finetuning
https://arxiv.org/abs/2211.01786
Apache License 2.0
507 stars 37 forks source link

mT0-xxl finetuning #19

Open sh0tcall3r opened 1 year ago

sh0tcall3r commented 1 year ago

Hello! Thanks a lot for your job! I'm using mT0-xxl for question answering task, however it performs with not so high quality I expected it to do. So I'm trying to finetune the model a little bit. If I understood correctly, first of all I should get checkpoint and gin file for the model I want to finetune. Could you please share with these? And is it possible to finetune it with torch or tf is the only way?

Muennighoff commented 1 year ago

Hey there are some more details on mT0 fine-tuning here: https://github.com/bigscience-workshop/xmtf/issues/12 The config is here: https://github.com/bigscience-workshop/xmtf/issues/6#issuecomment-1366147205

sh0tcall3r commented 1 year ago

Thanks for reply! Will try mentioned config.

sh0tcall3r commented 1 year ago

Hey @Muennighoff , It's seems that I still can't get a couple of things. Would be very appreciate If you could give me a hand here. Well, I need to finetune your model mT0-xxl (not the initial T5X-xxl), so accordingly to the manual https://github.com/google-research/t5x/blob/main/docs/usage/finetune.md I need 3 components (excluded SeqIO Task, which is clear as for now) to proceed: 1) Checkpoint -- Could you please share with mT0-xxl checkpoint? In the manual all used checkpoints are the TensorFlow weights etc, but on the HuggingFace there are only PyTorch weights. So I do need either mT0-xxl checkpoint in TensorFlow or finetune the model in PyTorch (is it even possible?) 2) Gin file for the model to finetune (mT0-xxl in the case) -- Could I use the default one like https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xxl.gin? 3) Gin file configuring finetuning process -- I write it by my own based on https://github.com/google-research/t5x/blob/main/t5x/configs/runs/finetune.gin with some overrides, right? Please, correct me if I wrong in some points.

Muennighoff commented 1 year ago

There's a t5x ckpt here: https://huggingface.co/bigscience/mt0-t5x I don't remember which size that model is though; I don't have the other ones anymore, maybe @adarob does

For 2. & 3., yes I think so

adarob commented 1 year ago

This does appear to be XXL

On Thu, May 18, 2023 at 5:02 AM Niklas Muennighoff @.***> wrote:

There's a t5x ckpt here: https://huggingface.co/bigscience/mt0-t5x I don't remember which size that model is though; I don't have the other ones anymore, maybe @adarob https://github.com/adarob does

For 2. & 3., yes I think so

— Reply to this email directly, view it on GitHub https://github.com/bigscience-workshop/xmtf/issues/19#issuecomment-1552755263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIJV2APKVDRBUQIFQ6JKQ3XGXQQZANCNFSM6AAAAAAYCE4Z34 . You are receiving this because you were mentioned.Message ID: @.***>

sh0tcall3r commented 1 year ago

Thanks a lot, guys!