curious about Alpaca and Vicuna datasets used in the training.

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

MIT License

2.25k stars 171 forks source link

hi to the team, thanks for your hard work and mPLUG-Owl demostrated great performance!

during reading the paper, i found that in "4.1 Experimental Setup/Data and Training Details.", it said "we gather pure text instruction data from three distinct sources: 102k data from the Alpaca [Taori et al., 2023], 90k from the Vicuna [Vicuna, 2023], and 50k from the Baize [Xu et al., 2023a]."

however, to my knowledge

the alpaca dataset only has 52k examples
vicuna has not released their dataset yet due to some concern.

would you mind sharing more information about the datasets? thanks a lot!

X-PLUG / mPLUG-Owl

curious about Alpaca and Vicuna datasets used in the training. #15