-
-
size mismatch for model.layers.78.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).
…
-
Dear author, thank you for your work! I would like to know the performance of LLoVi on next-qa, next-gqa and IntentQA, when using 7b llama2 as the LLM.
For larger model like gpt3.5 and gpt4, they ar…
-
Hi,
I am trying to finetune a llama2 model with sequence parallelism using Megatron-DS. Is there any documentation for this ?
-
![image](https://github.com/microsoft/Megatron-DeepSpeed/assets/33349843/c1a12cf3-3a2e-496b-ba53-e652f2d773ee)
```[tasklist]
### Tasks
```
-
I am trying to use the model_name_or_path parameter in this project, but I am unsure where I can find the relevant model links or resources. Could you please provide some guidance on where to download…
-
I am following the steps (https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb) to run a Llama2 quantized model (ht…
-
In the paper _Understanding Code Changes Practically with Small-Scale Language Models_ (ASE 2024) and the presentation _蚂蚁CodeFuse 的应用实践:应用环境下的代码变更理解技术_ (ChinaSoft2024), you mention the dataset HQCM, …
-
I was following the llama2 7b guide, consenus not enough ram and other issues.
tried the stories110M guide, worked all the way till I went to test it.
I may remember lm_eval not being installed (its…
-
Llama2 (and Llama-based models) timeout. Other chat models (tested Mistral, Mixtral) respond fine. Below is the snippet of the docker container log capturing when the request is sent from Refact exte…