MegEngine / Models

采用MegEngine实现的各种主流深度学习模型
Other
303 stars 99 forks source link

量化模型的时候出现不预期的问题 #103

Open ThreeLord opened 3 years ago

ThreeLord commented 3 years ago

环境

1.系统环境:ubuntu1804 2.MegEngine版本:1.4.0 3.python版本:3.6.8

复现步骤

  1. git clone https://github.com/MegEngine/Models.git
  2. python3 train.py -a resnet18 -d /path/to/imagenet --mode normal 出现此问题

请提供关键的代码片段便于追查问题

1、train.py 和 finetune.py 同样出现此问题

请提供完整的日志及报错信息

Traceback (most recent call last): File "finetune.py", line 315, in main() File "finetune.py", line 69, in main train_proc(world_size, args) File "finetune.py", line 159, in worker data.RandomSampler(train_dataset, batch_size=cfg.BATCH_SIZE, drop_last=True) File "/usr/local/python/lib/python3.6/site-packages/megengine/data/sampler.py", line 322, in init self.sampler_iter = iter(self.sampler) File "/usr/local/python/lib/python3.6/site-packages/megengine/data/sampler.py", line 100, in iter return self.batch() File "/usr/local/python/lib/python3.6/site-packages/megengine/data/sampler.py", line 146, in batch if self.drop_last and len(batch_index[-1]) < self.batch_size: IndexError: list index out of range

qliu93 commented 3 years ago

你好,根据你提供的日志,报错地点在这里:

    def batch(self) -> Iterator[List[Any]]:
        r"""
        Batch method provides a batch indices generator.
        """
        indices = list(self.sample())

        # user might pass the world_size parameter without dist,
        # so dist.is_distributed() should not be used
        if self.world_size > 1:
            indices = self.scatter(indices)

        step, length = self.batch_size, len(indices)
        batch_index = [indices[i : i + step] for i in range(0, length, step)]

        if self.drop_last and len(batch_index[-1]) < self.batch_size:
            batch_index.pop()

        return iter(batch_index)

看起来不是模型层面的问题,可能是没有成功加载到输入数据。建议打日志确认一下这里的 self.sample() 返回的列表是否为空