In the paper, you have said "The duration of the far-end signal is randomly clipped to 1 s, while the near-end signal is randomly segmented to 0.5~1 s."
What's the point of doing this?
We segment the near-end signal to 0.5~1s long is to simulate different double-talk situations. You do not neccessarily have to use such settings as we do.
In the paper, you have said "The duration of the far-end signal is randomly clipped to 1 s, while the near-end signal is randomly segmented to 0.5~1 s." What's the point of doing this?