我的代码 from vllm import LLM, SamplingParams from chatharuhi import ChatHaruhi (这里只要导入ChatHaruhi就会报Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method)

def loadmodel(model_name, peft_model, quantization=None, use_fast_kernels=True, seed=42, **kwargs):

加载model、tokenizer、rag

llm = LLM(model=model_name, max_model_len=40452, tensor_parallel_size=2) #这里只有tensor_parallel_size设置为1才能正常使用
torch.cuda.manual_seed(seed)
torch.manual_seed(seed)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

# rag
chatbot = ChatHaruhi(role_name='Sheldon', max_len_story=1000)
return llm, tokenizer, chatbot

LC1332 / Chat-Haruhi-Suzumiya

使用vllm数据并行和ChatHaruhi一起使用会报RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method #83

加载model、tokenizer、rag