CircuitCM / RVC-inference

High performance RVC inferencing, intended for multiple instances in memory at once. Also includes the latest pitch estimator RMVPE, Python 3.8-3.11 compatible, pip installable, memory + performance improvements in the pipeline and model usage.
MIT License
20 stars 3 forks source link

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,2) and requested shape (3,2) #4

Open hieuhthh opened 6 months ago

hieuhthh commented 6 months ago

image

I followed the tutorial to run this library. I tried various Python versions, including Python 3.11, and tested some models (for example, this one).

However, I still receive this error:

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,2) and requested shape (3,2)

Any help or suggestions would be appreciated. Thank you!

aicoder2048 commented 6 months ago

I run into same error, although I am using cuda. It looks like the version of rmvpe model pulled from HF is not compatible somehow. My Copilot explained the error as following, but it is not helpful for me to fix the problem. Maybe, someone could identify the proper version of rmvpe to avoide the error:

The error message is indicating that there is a shape mismatch when trying to broadcast the pad_width to the shape of the salience array in the np.pad function. The np.pad function is trying to pad the salience array with 4 elements on both sides, but it seems like the shape of the salience array is not compatible with the padding operation.

The error is happening in the to_local_average_cents method of the rmvpe class in the inferrvc package. This method is called by the decode method of the same class, which is in turn called by the infer_from_audio method.

Without knowing the exact shapes of the salience array and the pad_width, it's hard to say exactly what the issue is. However, it's possible that the salience array does not have the expected number of dimensions or the dimensions are not of the expected size.

To fix this issue, you may need to check the shape of the salience array before it's passed to the np.pad function and make sure it's compatible with the padding operation. If the salience array is not of the expected shape, you may need to reshape it or adjust the pad_width accordingly.

constlo commented 1 month ago

Hi!

I fixed the error by exporting my audio track as mono. When the audio is input as stereo it gets an additional dimension to the data, hence the shape (2, x, y) where 2 represents stereo.