descriptinc / cargan

Official repository for the paper "Chunked Autoregressive GAN for Conditional Waveform Synthesis"
https://maxrmorrison.com/sites/cargan
MIT License
188 stars 29 forks source link

TypeError: can't convert np.ndarray of type numpy.uint16. #11

Closed zerlinwang closed 1 year ago

zerlinwang commented 2 years ago

When I ran the code with my own dataset python -m cargan.preprocess --dataset ljspeech An error occured

Traceback (most recent call last):
File "XX/anaconda3/envs/cargan/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "XX/anaconda3/envs/cargan/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "XX/models/cargan/cargan/preprocess/main.py", line 26, in
cargan.preprocess.datasets(**vars(parse_args()))
File "XX/models/cargan/cargan/preprocess/core.py", line 37, in datasets
mels, pitch, periodicity = from_audio(audio, gpu=gpu)
File "XX/models/cargan/cargan/preprocess/core.py", line 62, in from_audio
pitch, periodicity = cargan.preprocess.pitch.from_audio(
File "XX/models/cargan/cargan/preprocess/pitch.py", line 38, in from_audio
pitch, periodicity = torchcrepe.predict(
File "XX/anaconda3/envs/cargan/lib/python3.8/site-packages/torchcrepe-0.0.15-py3.8.egg/torchcrepe/core.py", line 127, in predict
result = postprocess(probabilities,
File "XX/anaconda3/envs/cargan/lib/python3.8/site-packages/torchcrepe-0.0.15-py3.8.egg/torchcrepe/core.py", line 605, in postprocess
bins, pitch = decoder(probabilities)
File "XX/anaconda3/envs/cargan/lib/python3.8/site-packages/torchcrepe-0.0.15-py3.8.egg/torchcrepe/decode.py", line 76, in viterbi
bins = torch.tensor(bins, device=probs.device)
TypeError: can't convert np.ndarray of type numpy.uint16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

I guess it is cause by

  # Perform viterbi decoding
    bins = [librosa.sequence.viterbi(sequence, viterbi.transition)
            for sequence in sequences]
    # Convert to pytorch
    bins = torch.tensor(bins, device=probs.device)

in torchcrepe\decode.py

The datatype of bins is numpy.unint 16. Whether I need to modify the code in torchcrepe ?

maxrmorrison commented 2 years ago

You do not need to modify torchcrepe. ljspeech is not a dataset that I implemented. You should check how you setup that dataset and compare it to the examples provided in cargan/data/download.py.

daniil-lyakhov commented 2 years ago

Have the same issue, looks like librosa.sequence.viterbi returns np.uint16 values which is unsupported by torch.tensor

maxrmorrison commented 2 years ago

Yep, you were both correct. librosa updated the return type of librosa.sequence.viterbi from np.int64 to np.uint16. Fixed in torchcrepe version 0.0.16.