-
I'm curious whether there's any plan to support pretrianing models from scratch?
-
When re-producing the experiment of pretraining on mag240m and evaluating on arxiv, we found that the Contrastive baseline results in the similar performance as Prodigy when the aux loss is applied (u…
-
Hi, I am an undergraduate student, studying this repository. I have several questions.
It is noted that stage 1 needs 8 GPUs, and stage2 needs 4 GPUs.
But It seems that stage2 has more extended ar…
-
Thanks for your excellent work. Would you mind me asking what is the pretraining acc on imagenet2012 that then used for finetuning?
-
**Suggestions:**
- make the pretraining on GM games data work
- don't try to achieve too much at one:
- reduce the training dataset to a minimum
- reduce the model to a minimum
- normalize in…
-
### Describe the issue
Issue:
I first pretrained the projector using Clip + Gemma Model and then FIne tuned the Gemma and Projector, but no matter what It is giving in correct outputs, and the loss …
-
Hello,
I am pretraining Roberta from scratch on 64x16GB RAM GPUs on 330 GB (split into 128 partitions) of text but currently, at epoch 32 and pertaining seems to be very slow. Is this behavior no…
-
Thx for your work.The paper mentioned that attention network is pretrained 70000 iteration for convergence,could you please tell me how to do that?
-
I try to fine-tune the available InceptionResNet-v2 weights. But only generator weights are available. Is there a way you can provide the full generator and discriminator weights, which can be directl…
-
I am currently trying to further pretrain a RoBERTa model on a custom dataset, initializing the model with `roberta-base` weights. I am using [this script](https://github.com/manueltonneau/academic-bu…