pretraining Search Results

1000+ results
for pretraining

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

facebookresearch/XLM #154

Can you check whether my BERT pretraining is normal ofr not?

I followed your BERT pretraining. However, after one week of training, the loss is still around 7.3. I use 8 GPU with 14 per batch. The rest is same as default. INFO - 08/01/19 14:20:20 - 2:02:04 -…

gaopengcuhk updated 5 years ago
6
TinyLLaVA/TinyLLaVA_Factory #93

distributed computing

Hi, what do I need to change the code if I want to parallelize the computation with 8 gpu's

1764758458 updated 3 months ago
1
Lightning-AI/litgpt #1084

Exclude finetuning datasets from the `pretrain.py` arguments

The `pretrain.py` script lists the Alpaca dataset and all other finetuning datasets, but I don't think they are supported for finetuning. E.g., ```bash python litgpt/pretrain.py \ --data lit…

rasbt updated 6 months ago
3
Code-Rep/HELoC #2

Pretrained weights

Could you perhaps provide the pretrained weights (model.pkl)? If not, how long does pretraining approximately take?

ValeKnappich updated 1 year ago
1
jiaweizzhao/InRank #1

Somewhat Higher PPL Running on 4*A10

Hi, Thank you for your work. When running wiki103 gpt2 baseline and corresponding InRank pretraining experiments on 4*A10, the final evaluation performance of PPL is a bit higher. Does the running m…

Frederick666666 updated 9 months ago
1
stanford-crfm/levanter #482

Work around #475 (large model initialize on GPU)

We can: 1. enable continued pretraining by not requiring model initialization to load a hf checkpoint 2. try to force the GPU to partition the computation correctly somehow? 3. not using scanlay…

dlwh updated 7 months ago
2
facebookresearch/fairseq #3705

Dictionary entries with `#fairseq:overwrite` are not preserv…

## 🐛 Bug ### To Reproduce Steps to reproduce the behavior (**always include the command you ran**): 1. Run `fairseq-preprocess` with a `--srcdict` that has `#fairseq:overwrite`. For examp…

nelson-liu updated 1 year ago
1
OpenMOSS/AnyGPT #25

About input formats for training and inference

Anygpt is trained only with the Next Token Prediction task. Take text to image as an example，Is the training input speech tokens text tokens image tokens music tokens? I want to know the input…

wen020 updated 2 months ago
2
invictus717/MiCo #3

Some questions about the paper

How do I understand $`E_{Sam}`$ and the corresponding $`E^{T-I}_{Sam}`$ in the paper? Is it constructed using the positional embedding in the transformer like the learnable embedding $`E_{Pos}`$ etc. …

handsomelys updated 3 months ago
9
shawntan/scattermoe #12

Can't use torch.compile

When I compile the model, I get the following error. Any idea how to fix this? ``` Traceback (most recent call last): File "/home/user/anaconda3/envs/hydra/lib/python3.10/site-packages/IPython/…

shikhartuli updated 1 week ago
3

上一页 1...78 79 80 81 82 83 84...100 下一页

1000+ results for pretraining

1000+ results
for pretraining