-
The changes include the QK normlizaiton, Parallel layers and etc. It would be cool to see how CLIP performs by applying those changes to VIT-L VIT-B VIT-H
-
Ability to write Prompt in more than 100 languages.
Kandinsky 2.0
https://github.com/ai-forever/Kandinsky-2.0
https://huggingface.co/sberbank-ai/Kandinsky_2.0
Model architecture:
It is a late…
-
Thanks for your code? Have you add the [ViT-22B-384](https://arxiv.org/abs/2302.05442) in the pytorch model zoo? I haven't found it.
-
### Question
Does anyone have carried out the pretraining with Mixtral 8×7B? When I run the petraining script, one problem occured like the figure shown below. I just add a llava_mixtral.py to the ll…
-
Could you please share the process?
## Upvote & Fund
- We're using [Polar.sh](https://polar.sh/kyegomez) so you can upvote and help fund this issue.
- We receive the funding once the issue is compl…
-
-
ViT-22B conducted knowledge distillation experiments (refer to [Table 8](https://openreview.net/pdf?id=Lhyy8H75KA)), demonstrating that it is not only a large-scale model but also an excellent teacher…
-
Hi,
Do you plan to release the model and checkpoint of ViT22b presented in "Scaling Vision Transformers to 22 Billion Parameters" ?
-
Would be nice to have this one here (https://arxiv.org/abs/2311.01906).
-
Hi,
first of all, thanks for your great contributions to open research!
I have confused about model architecture will influence model performance, I note that pythia model Layer Block like
pseud…