data_pipeline中sort是必要的处理步骤吗

请问一下，该方法是否可以删除，影响性能？还是影响效果？

def sort(data, sort_size=500, mode='train'): """ Sort the data by feature length. Sort is used after shuffle and before batch, so we can group utts with similar lengths into a batch, and sort_size should be less than shuffle_size

    Args:
        data: Iterable[{key, feat, label}]
        sort_size: buffer size for sort

    Returns:
        Iterable[{key, feat, label}]
"""

buf = []
for sample in data:
    buf.append(sample)
    if len(buf) >= sort_size:
        buf.sort(key=lambda x: x['speech_feat'].size(0))
        for x in buf:
            yield x
        buf = []
# The sample left over
buf.sort(key=lambda x: x['speech_feat'].size(0))
for x in buf:
    yield x

FunAudioLLM / CosyVoice

data_pipeline中sort是必要的处理步骤吗 #636