NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
887 stars 177 forks source link

Unable to train on custom data with multiple speakers #159

Open naveed81 opened 11 months ago

naveed81 commented 11 months ago

Hi

I am trying to train flowtron from scratch using custom data with 107 speakers (english) and I am getting this error. Slogged on it for 2 days but couldnt find a solution. Any help would be highly appreciated.

Number of speakers : 107 Number of speakers : 107 Setting up Tensorboard log in outdir/logs Epoch: 0 ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize failed.

Traceback (most recent call last): File "./train.py", line 443, in train(n_gpus, rank, train_config) File "./train.py", line 321, in train attn_logprob, mean, log_var, prob) = model( File "/home/ubuntu/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/ubuntu/flowtron/flowtron.py", line 875, in forward text = self.encoder(text, in_lens) File "/home/ubuntu/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/ubuntu/flowtron/flowtron.py", line 498, in forward mask = get_mask_from_lengths(in_lens).unsqueeze(1) if x.size(0) > 1 else None File "/home/ubuntu/flowtron/flowtron.py", line 47, in get_mask_from_lengths max_len = torch.max(lengths).item() RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

When I take a subset of single speaker from the same training data, its working normally. What am I missing?

CookiePPP commented 11 months ago

https://github.com/NVIDIA/flowtron/blob/d149bc466b8325026aeb68aa2fef24337e7463b2/config.json#L50

Make sure this line is set to 107 or higher in whichever json file your using as your config file.

If you are able to, setting the CUDA_LAUNCH_BLOCKING=1 environment variable might give you a more accurate error message. The error message you sent doesn't contain the line the error occured (since the error happened asynchronously on the GPU)

naveed81 commented 11 months ago

I changed n_speakers to 107 in config.json and it worked like a charm. But the training is painstakingly low. I am running AWS instance with 16 GB RAM. How much RAM will I need according to you? My goal is to develop multi speaker Indian accent english tts model. Is training with 107 speakers (total of 44k audio files) an overkill? Please suggest me.

On Mon, Oct 2, 2023 at 3:59 AM Cookie @.***> wrote:

https://github.com/NVIDIA/flowtron/blob/d149bc466b8325026aeb68aa2fef24337e7463b2/config.json#L50

Make sure this line is set to 107 or higher in whichever json file your using as your config file.

If you are able to, setting the CUDA_LAUNCH_BLOCKING=1 environment variable might give you a more accurate error message. The error message you sent doesn't contain the line the error occured (since the error happened asynchronously on the GPU)

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/flowtron/issues/159#issuecomment-1742222755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIAJSQSH26RTKS5HIT5DU53X5HVFPANCNFSM6AAAAAA5OQKTT4 . You are receiving this because you authored the thread.Message ID: @.***>

naveed81 commented 11 months ago

I meant training is painstakingly slow (not low)

On Mon, Oct 2, 2023 at 4:05 AM Naveed Ahmed @.***> wrote:

I changed n_speakers to 107 in config.json and it worked like a charm. But the training is painstakingly low. I am running AWS instance with 16 GB RAM. How much RAM will I need according to you? My goal is to develop multi speaker Indian accent english tts model. Is training with 107 speakers (total of 44k audio files) an overkill? Please suggest me.

On Mon, Oct 2, 2023 at 3:59 AM Cookie @.***> wrote:

https://github.com/NVIDIA/flowtron/blob/d149bc466b8325026aeb68aa2fef24337e7463b2/config.json#L50

Make sure this line is set to 107 or higher in whichever json file your using as your config file.

If you are able to, setting the CUDA_LAUNCH_BLOCKING=1 environment variable might give you a more accurate error message. The error message you sent doesn't contain the line the error occured (since the error happened asynchronously on the GPU)

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/flowtron/issues/159#issuecomment-1742222755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIAJSQSH26RTKS5HIT5DU53X5HVFPANCNFSM6AAAAAA5OQKTT4 . You are receiving this because you authored the thread.Message ID: @.***>

CookiePPP commented 11 months ago

Speed should be the same for single speaker and multi speaker datasets. You can change n_speakers to 128 and n_text to 256 to slightly improve performance, but it will be a very small improvement.

naveed81 commented 11 months ago

Any suggestions on the batch_size? I have set it to 12 currently.

On Mon, Oct 2, 2023 at 4:11 AM Cookie @.***> wrote:

Speed should be the same for single speaker and multi speaker datasets. You can change n_speakers to 128 and n_text to 256 to slightly improve performance, but it will be a very small improvement.

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/flowtron/issues/159#issuecomment-1742225363, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIAJSQR752D7WLWPXYYXE6DX5HWQLANCNFSM6AAAAAA5OQKTT4 . You are receiving this because you authored the thread.Message ID: @.***>

naveed81 commented 11 months ago

I started getting nan as training loss after around 1362 iterations. Batch size is 6. Am I missing something? Pls suggest.

1358: 2222.559082031 1359: 2223.505371094 1360: 2227.125732422 1361: 2232.143798828 1362: nan 1363: 2235.490234375 1364: 2239.844970703 1365: 2241.337402344 1366: 2244.529052734 1367: 2250.654785156 1368: nan 1369: nan 1370: 2255.019775391 1371: 2259.218505859 1372: nan 1373: nan 1374: 2266.915771484 1375: nan 1376: 2273.206298828 1377: nan 1378: nan 1379: 2277.475341797 1380: nan 1381: nan 1382: nan 1383: nan

On Mon, Oct 2, 2023 at 4:13 AM Naveed Ahmed @.***> wrote:

Any suggestions on the batch_size? I have set it to 12 currently.

On Mon, Oct 2, 2023 at 4:11 AM Cookie @.***> wrote:

Speed should be the same for single speaker and multi speaker datasets. You can change n_speakers to 128 and n_text to 256 to slightly improve performance, but it will be a very small improvement.

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/flowtron/issues/159#issuecomment-1742225363, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIAJSQR752D7WLWPXYYXE6DX5HWQLANCNFSM6AAAAAA5OQKTT4 . You are receiving this because you authored the thread.Message ID: @.***>