bfs18 / nsynth_wavenet

parallel wavenet based on nsynth
106 stars 30 forks source link

"wave_length": 7680,what this hp mean? #28

Open switchzts opened 6 years ago

switchzts commented 6 years ago

Does it represent the minimum length of each speech? Why do you want to send each segment of speech into training? Is it the reason for memory resources?If my audio is about 10-15s, does it cause my model to generate meaningless audio?

bfs18 commented 6 years ago

--Does it represent the minimum length of each speech? --No, all input waves are cropped to length 7680. In input queue, a wave segment of length 7680 is randomly cropped from a longer input wave.

--Why do you want to send each segment of speech into training? Is it the reason for memory resources? --Yes, longer wave segments consume much more gpu memory.

--If my audio is about 10-15s, does it cause my model to generate meaningless audio? --No, the model trained on waves of length 7680 generalizes well on longer sequences.