Open Florenyci opened 5 days ago
To be clear, I'm not looking for parallel inference process to multiple GPU, I noticed this code but not what I want, I want to do inference with multiple prompts at the same time, which can be sequential so that I don't need to run bash inference.sh
one by one
Oh, if you just don't want to run inference.sh A good method is to directly save your multiple prompts, each on a new line, into a txt file, and then change input_name to your txt file in inference.yaml. It seems you are using SAT reasoning, so I don't know why this problem occurred
Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select
Please ensure your CUDA is installed correctly; the SAT code has moved the entire Transformers model to CUDA, and such issues will not arise
Oh, if you just don't want to run inference.sh A good method is to directly save your multiple prompts, each on a new line, into a txt file, and then change input_name to your txt file in inference.yaml. It seems you are using SAT reasoning, so I don't know why this problem occurred
Yes that's exactly what I have done, I'm using I2V model to do inference, the txt file is like this: text@@img.png text@@img.png
But it's not working as T2V model (if only one prompt it's working), I think there are some bugs in I2V mode when input multiple prompts
I got this error too. I will fix it.
FrozenT5Embedder's transformer's device get converted to cpu after first inference. This is the problem. So why does it convert to cpu? Good Question. I don't know yet. But i will fix it.
System Info / 系統信息
I could be running official inference script successfully
Information / 问题信息
Reproduction / 复现过程
Hi guys, thanks for your cool work! I was trying to do inference with official 5B-I2V model, but when I changed test.txt, like each line including one prompt, but I got error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
, is there any way I can do batch inference?Expected behavior / 期待表现
I hope it could do batch inference