questions about training

ByungKwanLee / MoAI

[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.

MIT License

311 stars 32 forks source link

questions about training #22

Open cassiaaaaaa opened 4 months ago

cassiaaaaaa commented 4 months ago

Dear author, I see your new work Meteor, its so awesome! But I still have questions with MoAI. Sorry to bother again.

The first one: in the paper, you mentioned you used beamsearch=3 in generation, but I see the demo you used top_p=0.95. Is beamsearch=3 used in training? But In training, using beamsearch seems not common.

The second one: What type of initialization is used for MoE (the six experts) in the second training step?

ByungKwanLee commented 4 months ago

Text generation is not involved in training

Therefore, beam search is only depending on inference

For the demo, we set topp and temperature for diverse text generation

We didn't use specific initializer for MoE