Open switchzts opened 6 years ago
--Does it represent the minimum length of each speech? --No, all input waves are cropped to length 7680. In input queue, a wave segment of length 7680 is randomly cropped from a longer input wave.
--Why do you want to send each segment of speech into training? Is it the reason for memory resources? --Yes, longer wave segments consume much more gpu memory.
--If my audio is about 10-15s, does it cause my model to generate meaningless audio? --No, the model trained on waves of length 7680 generalizes well on longer sequences.
Does it represent the minimum length of each speech? Why do you want to send each segment of speech into training? Is it the reason for memory resources?If my audio is about 10-15s, does it cause my model to generate meaningless audio?