Snowdar / asv-subtools

An Open Source Tools for Speaker Recognition
Apache License 2.0
597 stars 135 forks source link

DistributedSampler #8

Closed matln closed 3 years ago

matln commented 4 years ago

I noticed the WARNNING in the source code for torch.utils.data.distributed.DistributedSampler :

    .. warning::
        In distributed mode, calling the :math`set_epoch(epoch) <set_epoch>` method at
        the beginning of each epoch **before** creating the :class:`DataLoader` iterator
        is necessary to make shuffling work properly across multiple epochs. Otherwise,
        the same ordering will be always used.

so we should add data.train_sampler.set_epoch(this_epoch) at the begin of every epoch? zhihu

Snowdar commented 4 years ago

Hi, the answer is yes. Thank you provide this discovery and we will fix it before long. If you plan to do it now, you could use "if data.train_sampler is not None" to distinguish the single-GPU training case.

Best!

On Aug 21, 2020, at 10:23 AM, j6me notifications@github.com wrote:

 I noticed the WARNNING in the source code for torch.utils.data.distributed.DistributedSampler :

.. warning::
    In distributed mode, calling the :math`set_epoch(epoch) <set_epoch>` method at
    the beginning of each epoch **before** creating the :class:`DataLoader` iterator
    is necessary to make shuffling work properly across multiple epochs. Otherwise,
    the same ordering will be always used.

so we should add data.train_sampler.set_epoch(this_epoch) at the begin of every epoch? zhihu

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.