日志信息为： INFO: Started server process [3559242] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:6006 (Press CTRL+C to quit) The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:128000 for open-end generation. INFO: 2.0.1.1:59967 - "POST / HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application ... RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

七号卡上已经起了一个模型，再去起llama3 显示张量设备异常，修改一下cuda设备的参数设置方式

清理GPU内存函数

def torch_gc(): if torch.cuda.is_available(): # 检查是否可用CUDA

with torch.cuda.device(CUDA_DEVICE): # 指定CUDA设备

    if torch.cuda.is_available()  : 
        torch.cuda.empty_cache()  # 清空CUDA缓存
        torch.cuda.ipc_collect()  # 收集CUDA内存碎片

部署时 CUDA_VISIBLE_DEVICES=7 python3 fast_api.py
可以正常调用

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.21it/s] INFO: Started server process [3610169] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:6006 (Press CTRL+C to quit) The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:128000 for open-end generation. [2024-05-21 15:30:38] ", prompt:"你好", response:"'😊 你好！我是 Chatbot，很高兴和你交流！有什么想聊的主题或问题？

datawhalechina / self-llm

llama3 api报错 #122

清理GPU内存函数

with torch.cuda.device(CUDA_DEVICE): # 指定CUDA设备