-
Working on a keras.io guide for pretraining a keras-nlp transformer model from scratch, using word piece tokenizer, transformer encoder, embedding layers, and our MLM layer helpers.
Will link a dra…
-
Hello,
Thank you for the wonderful work.
I was trying to reproduce the results shown in paper for VRD.
Currently I am getting around the following scores:
R@20: 0.5417 R@50: 0.6160 R@100: 0…
-
Hi, thank you for the code.
Could you please provide your cifar10 code for reproducing? I have followed your supplemental material and run the code for 1000 epochs for pretraining (Moco based). But …
-
https://github.com/training-transformers-together/hf-website-how-to-join
Demo page (updated on push): https://training-transformers-together.github.io/
- [x] intro and motivation text
- [x] liv…
-
I am trying to reproduce RACNN network performance with pytorch.
But there isn't details about how to train a APN network.
without pretraining, rank loss doesn't decrease.
i am wonder this code w…
-
self.context_mlm_trans and self.context_order_trans are expecting a different key-structure
RuntimeError: Error(s) in loading state_dict for BertPredictionHeadTransform:
Missing key(s) in stat…
-
Hi, What file corresponds to dis_saveto?
Looking forword to your reply.
-
You guys did a great job in VDU field. Congratulations!
By the way, I wonder if I can replace the mBART by RoBERTa-XLM on finetune process without doing pretraining jobs again?
-
While working on an autoencoder I started implementing pertaining phase, It all looked good for quite a while and the weights and biases were properly shared among layers but when started training the…
-
Hi, if I would like to reproduce the results with A6000 from Nvidia, how long will it take to train the model from scratch?
LyuJZ updated
7 months ago