Closed puppyapple closed 1 month ago
Thanks for reporting. I am not exactly sure how the workers_per_device
implementation in LitGPT works, i.e., how it works under the hood. Maybe @aniketmaurya can chime in here.
@puppyapple seems like a wrong device id is being set, could you print the device in the setup
method and see what it prints?
@puppyapple seems like a wrong device id is being set, could you print the device in the
setup
method and see what it prints? So I add the print in thesetup
function ofBaseLitAPI
above
which gives me :
INFO: Waiting for application startup.
INFO: Application startup complete.
device passed in : cuda:0
Initializing model...
device='[0]'
self.devices=1
device passed in : cuda:0
Initializing model...
device='[0]'
self.devices=1
Model successfully initialized.
Setup complete for worker 1.
Model successfully initialized.
Setup complete for worker 0.
Are these normal?
Thanks for reporting. I am not exactly sure how the
workers_per_device
implementation in LitGPT works, i.e., how it works under the hood. Maybe @aniketmaurya can chime in here.
@rasbt @aniketmaurya I have tried another test without litgpt serve
: using multiprocessing to load n models to process chunks of data in parallel, and I will get the similar error:
e` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1231: indexSelectSmallIndex: block: [0,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1231: indexSelectSmallIndex: block: [0,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1231: indexSelectSmallIndex: block: [0,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/home/puppyapple/Server/BigAI/Chinese_LLM_From_Scratch/Journey/Day11/multi_model_inference.py", line 26, in process_chunk
response = model.generate(prompt=prompt, max_new_tokens=350)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/litgpt/api.py", line 445, in generate
outputs = generate_fn(
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/litgpt/generate/base.py", line 140, in generate
token = next_token(
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/litgpt/generate/base.py", line 77, in next_token
logits = model(x, input_pos)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 141, in forward
output = self._forward_module(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/litgpt/model.py", line 94, in forward
x = block(x, cos, sin, mask, input_pos)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/litgpt/model.py", line 197, in forward
attention_output = self.attn(x_normed, cos, sin, mask, input_pos)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/litgpt/model.py", line 237, in forward
qkv = self.attn(x)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/puppyapple/anaconda3/envs/bigmodel/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 117, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
the script that I used is below:
import json
import multiprocessing
from functools import partial
from litgpt import LLM
from litgpt.prompts import MicroStories
import click
import torch
# 设置多进程启动方法为'spawn'
multiprocessing.set_start_method("spawn", force=True)
def init_model():
model = LLM.load(
model="../../Experiments/Output/sft/microstories/mask_prompt_5e-4/final"
)
return model
def process_chunk(model, chunk):
ms = MicroStories()
results = []
for case in chunk:
prompt = ms.apply(prompt=case["instruction"], input=case["input"])
with torch.no_grad():
response = model.generate(prompt=prompt, max_new_tokens=350)
results.append(
{"prompt": prompt, "rejected": response, "chosen": case["output"]}
)
return results
@click.command()
@click.option("-n", "--num_processes", default=4, help="并发进程数")
@click.option("--test", is_flag=True, help="测试模式,只处理前100条数据")
def main(num_processes, test):
# 加载SFT数据
with open(
"../../Data/TinyStoriesInstruct/sft_data_v2.json", "r", encoding="utf-8"
) as f:
sft_data = json.load(f)
if test:
sft_data = sft_data[:100]
# 确定进程数量
n_processes = min(multiprocessing.cpu_count(), num_processes)
# 初始化模型
model = init_model()
# 使用partial创建一个新的函数,将model作为第一个参数
process_chunk_with_model = partial(process_chunk, model)
# 将数据分成n_processes份
chunk_size = len(sft_data) // n_processes
chunks = [sft_data[i : i + chunk_size] for i in range(0, len(sft_data), chunk_size)]
# 使用进程池并行处理数据
with multiprocessing.Pool(n_processes) as pool:
results = pool.map(process_chunk_with_model, chunks)
# 合并结果
dpo_samples = [item for sublist in results for item in sublist]
# 保存结果
output_file = "dpo_samples_test.json" if test else "dpo_samples.json"
with open(output_file, "w", encoding="utf-8") as f:
json.dump(dpo_samples, f, ensure_ascii=False, indent=2)
print(f"处理完成,结果已保存到 {output_file}")
if __name__ == "__main__":
main()
my model is load from a checkpoint of SFT
from litgpt finetune_full
.
not sure if this error is the same root cause as the serve
case above, if yes, then maybe the problem is not in litserve
? Since I did not use anything related with it this time.
Update:
I updated the litgpt
to the latest version(0.4.12
), all the errors above disappear for now: 25 workers under 25 concurrency requests for massive data generation, and no cuda errors for 30 min's running till now.
Not sure which update(s) between 0.4.10 and 0.4.12 fix this.
Nice, glad to hear that this works fine now without requiring any additional fix! I'll close this issue as completed, but please let us know in case there are any issues that occur later.
Bug description
I change the serve.py a little to add the
workers_per_device
parameter, and then I served with workers_per_device > 1. When the concurrency of request > 1, I will get:the modified server script is below:
the test script is below:
What operating system are you using?
Linux
LitGPT Version