Open hieuhthh opened 6 months ago
I run into same error, although I am using cuda. It looks like the version of rmvpe model pulled from HF is not compatible somehow. My Copilot explained the error as following, but it is not helpful for me to fix the problem. Maybe, someone could identify the proper version of rmvpe to avoide the error:
The error message is indicating that there is a shape mismatch when trying to broadcast the pad_width to the shape of the salience array in the np.pad function. The np.pad function is trying to pad the salience array with 4 elements on both sides, but it seems like the shape of the salience array is not compatible with the padding operation.
The error is happening in the to_local_average_cents method of the rmvpe class in the inferrvc package. This method is called by the decode method of the same class, which is in turn called by the infer_from_audio method.
Without knowing the exact shapes of the salience array and the pad_width, it's hard to say exactly what the issue is. However, it's possible that the salience array does not have the expected number of dimensions or the dimensions are not of the expected size.
To fix this issue, you may need to check the shape of the salience array before it's passed to the np.pad function and make sure it's compatible with the padding operation. If the salience array is not of the expected shape, you may need to reshape it or adjust the pad_width accordingly.
Hi!
I fixed the error by exporting my audio track as mono. When the audio is input as stereo it gets an additional dimension to the data, hence the shape (2, x, y) where 2 represents stereo.
I followed the tutorial to run this library. I tried various Python versions, including Python 3.11, and tested some models (for example, this one).
However, I still receive this error:
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,2) and requested shape (3,2)
Any help or suggestions would be appreciated. Thank you!