-
I have to ask that do you have any figures how much this takes for pretraining and on which GPU and which dataset_ ? please if you can guide.
-
The README mentions this codebase can act as a "reference for enthusiasts keen on pretraining language models under 5 billion parameters". I'm wondering if you could give a brief guide on how to do so…
-
This issue intends to compare performances between the model trained from scratch on `dcm-zurich-lesions-*` (#1) vs. a model pretrained on `dcm-zurich` for detecting compression sites and using those …
-
When trying to pretrain t5-base, we are seeing that that pretraining loss starts at an enormous number (~160000).
Even when trying to pretrain smaller variants of t5, the initial pretraining loss alw…
-
Hello, thanks for your contributory work! I find that there isn't a `paper_train.csv` in the `data_csv.zip`. Is the paper path in this csv file the same as the PMC-Inline text json file from your hugg…
-
You had mentioned that the backbone network is ResNet-50 pretrained on Imagenet.
https://github.com/thuml/Universal-Domain-Adaptation/blob/5d7caa95af7e3675305c542253c4e372801897d2/net.py#L37
Bu…
-
## ❓ Questions and Help
#### What is your question?
I am trying to replicate the HuBERT base pretraining iter1 on librispeech 960hr. However, the training curve seems to be weird, as the unmask co…
-
Hi all!
Since the initial training is based on Mindspore, I'm wondering if there is any training result for the first stage on the Megatron.
-
## Paper Link
https://arxiv.org/abs/2002.01685
https://github.com/aghie/parsing-as-pretraining
## Upload
2020/2/5
## What is paper about?
## Paper Contributions
## Key Points
## Va…
-
Hi,
I am trying to reproduce pretraining of mt5 model, when you modify the sentences as:
`Thank you to week => for inviting me your party last `
Then do you compute the loss on all to…