assistant respond nothing of SPHINX inference.py

Hi! You are running the script with model parallel size equal to 2, so two processes, rank0 and rank1, will be spawned. I guess that your rank1 had failed due to some reason, but because of this line of code: https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/512943221c8872bcc9d11512e5cbcc039ad0b575/SPHINX/inference.py#L62 output from rank1 is blocked, so you did not receive the error message. As communication between the two ranks are needed during model inference, rank0 would be stuck.

To figure out what the problem is, you may comment the aforementioned line of code to see if rank1 outputs any useful message.

Alpha-VLLM / LLaMA2-Accessory

assistant respond nothing of SPHINX inference.py #104