-
I use one node with four GPU(V100, 32G) for pretrain, but parallel training is a little weird. All **four** processes run on **one(device:0)** GPU.
Why it happened? Thanks for everyone's help!
I u…
-
Optimus/doc/optimus_finetune_language_models.md
beta=0, latent size = 32 https://chunylcus.blob.core.windows.net/machines/msrdl/optimus/output/pretrain/philly_rr3_vc4_g8_base_vae_wikipedia_pretrain…
-
## ❓ Questions and Help
#### What is your question?
I am trying to replicate the HuBERT base pretraining iter1 on librispeech 960hr. However, the training curve seems to be weird, as the unmask co…
-
Hi,
I am implementing fine-tuning exBERT for sequence classification. I already have done the pretraining for my data. However, since the pre-training python script that you have provided is only f…
-
参考这里https://qwen.readthedocs.io/zh-cn/latest/training/SFT/example.html脚本
使用 24张A100,对7B sft,model_max_length超过2w时,OOM
-
It is mentioned in the repo that the pretraining step should run for some time, please mention after how much time i should interrupt it.
Also i can't use the pretrained npz file as i'm planning to…
-
Thank you for opening and maintaining this project.
I want to reference your paper and reproduce your experimental results. However, I find that the region-level files for pretraining (eg. 201511_…
-
Are there any appropriate setups or losses in sentence-transformers for pretraining sentence embeddings in cases where I have labels as targets?
(I want to finetune the actual embeddings, not just a…
-
Can you please let me know how can we unsupervised training for T5 model? This [link](https://www.sbert.net/examples/unsupervised_learning/MLM/README.html) and this [link](https://github.com/huggingfa…
-
As mentioned in issue [https://github.com/google-research/big_transfer/issues/26] the loss is sigmoid binary cross entropy for each label. I have few more questions about the loss:
1) How is the obje…