pretraining Search Results

1000+ results
for pretraining

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

princeton-nlp/LLM-Shearing #51

Could you provide tokenized continue-pretraining dataset for…

Could you provide tokenized continue-pretraining dataset for reproduction like pruning dataset? Is tokenizer.model you provided exactly the same tokenizer as Llama-2?

gywlssww updated 2 months ago
3
chengzhag/Implicit3DUnderstanding #21

Pretraining weight link invalid

Hi, Thank you for releasing the code. In your "Pretraing" section, you mentioned that: "Pretraining We use the [pretrained checkpoint](https://livebournemouthac-my.sharepoint.com/:u:/g/personal/…

gyhandy updated 1 year ago
1
facebookresearch/fairseq #3372

The exact English pretraining data and Chinese pretraining d…

Any one know where to get them? Thank you and thank you.

guotong1988 updated 3 years ago
1
lancopku/label-words-are-anchors #26

求解释Saliency分数得到的tensor

您好！我参考您的代码，将应用于GPT2的Attentioner Manager应用到Llama上，然后得到了saliency分数，每一层都是[1,1,seq_len,seq_len]，部分具体数值如下：我想知道这里每一层的saliency分数的具体含义？我的代码如下： ``` class LlamaAttentionManager(AttentionerManagerBase): …

Patrick-Ni updated 2 months ago
1
HazyResearch/hyena-dna #46

Pretraining runtimes from the paper

Hi! Great work, and also great youtube presentation, thanks for making that public. I have a question about the runtimes. In the Table A.2 it says that pre-training took 80min for the model with 1…

sgalkina updated 7 months ago
1
huggingface/optimum-habana #1396

Pretrain with LLama Model - num_samples=0 Error

### System Info ```shell vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest ``` ### Information - [X] The official example scripts - [ ] My own modified scri…

saisuryateja1436 updated 2 days ago
1
IBM/AFDistill #1

Could you provide the pretraining weights

Hi, Thanks for your wonderful work, I just wondering if you could provide the training weights? I want to have a try based on your model. Thanks a lot!

lizongquan01 updated 5 months ago
2
NVIDIA/DeepLearningExamples #1086

Electra-small pretraining

Hey, I want to pretrain and benchmark small and base versions of Electra for the Arabic and Persian languages. As mentioned in the run_pretraining python file, only "base" and "large" model_size ar…

73minerva updated 2 years ago
1
mlfoundations/dclm #68

What is the pretrain scripts?

Thank you for your excellent work. If I want to use this data for pretraining and conduct a rigorous comparison with the DCLM-BASELINE 7B model mentioned here, what hyper-parameters should I use? Coul…

mathfinder updated 1 month ago
12
OpenBMB/MiniCPM #247

[Question]：请问llama.cpp部署中convert_hf_to_gguf时，为什么需要modify_ten…

在convert_hf_to_gguf.py文件中，转换MiniCPM模型的时候，如下类override了modify_tensors，并且只转换了q_proj.weight和k_proj.weight，请问为什么需要转换呢？或者如注释所说“HF models permute some of the tensors, so we need to undo that”，HF model是在那里做了这…

FdyCN updated 1 week ago
1

上一页 1...18 19 20 21 22 23 24...100 下一页

1000+ results for pretraining

1000+ results
for pretraining