distributed-llm Search Results

1000+ results
for distributed-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

alibaba/Megatron-LLaMA #10

hf转megatron shape错误

我先用模型转换的脚本将llama2-7b从huggingface转到megatron，训练时出现shape问题： ``` Traceback (most recent call last): File "/code/xx/LLM_mine/reference/Megatron-LLaMA/pretrain_llama.py", line 119, in < module > …

Double-bear updated 1 year ago
10
modelscope/ms-swift #1837

How to use XLA/TPU? Potentially SPMD FSDP with InternVL mode…

**Describe the bug** What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图) **Your hardware and system info** Write your system info like CUDA version/system/GPU/torc…

radna0 updated 1 week ago
5
instructlab/sdg #276

Generation fails with exceeding context length

The data generation fails with exceeded model's context length. I'm assuming there is something wrong with my input data but it's hard to tell because the error message doesn't give me any pointers. …

TomasTomecek updated 4 days ago
4
aws-neuron/aws-neuron-sdk #884

Compilation failed for llama3-70B model - Estimated peak HB…

I am trying to finetune llama3-70B on trn132xlarge using distributed training. It failed with following error: Container image: f"763104351884.dkr.ecr.{region}.amazonaws.com/pytorch-training-neur…

ak-org updated 2 months ago
4
vllm-project/vllm #5477

[Usage]: OpenRLHF: How can I create a second NCCL Group in a…

### Your current environment We are working on accelerating RLHF algorithms and need to broadcast the weights of the DeepSpeed engine to the vLLM Ray worker. In v0.4.2, we were able to create an ad…

hijkzzz updated 2 weeks ago
7
Mozilla-Ocho/llamafile #395

Template import/export and defaults

I think it would be helpful to be able to import/export templates. So when I load up to run a new model with llamafile, i can simply point it to some template file definition that contains all the req…

bannsec updated 3 months ago
1
rahuldshetty/llm.js #9

Distributed inference

What would it take for the project to add support for distributed inference?

pathquester updated 2 months ago
1
vllm-project/vllm #1753

Add worker registry service for hosting multiple vllm model …

I have been using vllm integration from fastchat to host multiple vllm models. However, it does not offer the full capability of vllm. e.g. It does not support beam search. I would like to propose…

tjtanaa updated 6 months ago
2
yuanzhoulvpi2017/zero_nlp #118

我想使用deepspeed训练bloom，但发现以下错误

pip install deepspeed 再直接sh ds_all.sh 但出现以下错误，想知道发生了什么？ ```bash zero_nlp-main/chinese_bloom$ sh ds_all.sh [2023-06-03 12:26:34,143] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfil…

fredericklee602 updated 1 year ago
1
vllm-project/vllm #3033

Qwen 14B AWQ deploy: AttributeError: 'ndarray' object has no…

$ python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8001 --model Qwen1.5-14B-Chat-AWQ --tensor-parallel-size 2 --quantization awq --trust-remote-code --dtype half INFO 02-26 1…

testTech92 updated 2 months ago
9

上一页 1...13 14 15 16 17 18 19...100 下一页

1000+ results for distributed-llm

1000+ results
for distributed-llm