-
Could you provide tokenized continue-pretraining dataset for reproduction like pruning dataset?
Is tokenizer.model you provided exactly the same tokenizer as Llama-2?
-
Hi,
Thank you for releasing the code. In your "Pretraing" section, you mentioned that:
"Pretraining
We use the [pretrained checkpoint](https://livebournemouthac-my.sharepoint.com/:u:/g/personal/…
-
Any one know where to get them?
Thank you and thank you.
-
您好!我参考您的代码,将应用于GPT2的Attentioner Manager应用到Llama上,然后得到了saliency分数,每一层都是[1,1,seq_len,seq_len],部分具体数值如下:
我想知道这里每一层的saliency分数的具体含义?
我的代码如下:
```
class LlamaAttentionManager(AttentionerManagerBase):
…
-
Hi! Great work, and also great youtube presentation, thanks for making that public.
I have a question about the runtimes. In the Table A.2 it says that pre-training took 80min for the model with 1…
-
### System Info
```shell
vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest
```
### Information
- [X] The official example scripts
- [ ] My own modified scri…
-
Hi,
Thanks for your wonderful work, I just wondering if you could provide the training weights? I want to have a try based on your model. Thanks a lot!
-
Hey,
I want to pretrain and benchmark small and base versions of Electra for the Arabic and Persian languages. As mentioned in the run_pretraining python file, only "base" and "large" model_size ar…
-
Thank you for your excellent work. If I want to use this data for pretraining and conduct a rigorous comparison with the DCLM-BASELINE 7B model mentioned here, what hyper-parameters should I use? Coul…
-
在convert_hf_to_gguf.py文件中,转换MiniCPM模型的时候,如下类override了modify_tensors,并且只转换了q_proj.weight和k_proj.weight,请问为什么需要转换呢?或者如注释所说“HF models permute some of the tensors, so we need to undo that”,HF model是在那里做了这…