Inference: Batch size > 1

It appears that batch inference is not currently supported. If the batch size is anything other than 1, inference fails

In models/vallex.py, inference():

assert y.shape[0] == 1, y.shape

This does not allow batch sizes other than 1 for audio prompts

            xy_pos = torch.concat([x, y_pos], dim=1)

This assumes the batch size is the same for x and y_pos.

Here is the error you get if the batch size is 2 and best_of is 5

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 10 but got size 5 for tensor number 1 in the list.

Please advise

Plachtaa / VALL-E-X