Oneflow-Inc / models

Models and examples built with OneFlow
Apache License 2.0
94 stars 37 forks source link

KnowledgeDistillation问题记录 #363

Closed songzetao closed 2 years ago

songzetao commented 2 years ago

原 pr https://github.com/Oneflow-Inc/models/pull/234 使用的 oneflow 接口最晚出现在 0.4 版本中,0.5 版本中就没有相应接口了,由于需要大量的修改,所以选择使用 oneflow 0.7版本重写该算法,该 issue 记录重写过程中出现的问题。

songzetao commented 2 years ago

flowvision 下载 MNIST 数据集出现 Segmentation fault (core dumped)

复现代码:

from flowvision import datasets
datasets.MNIST('.', train=True, download=True)

报错信息:

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz
9913344it [00:07, 1402622.21it/s]                                                             
Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw
Segmentation fault (core dumped)

运行环境: onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.7.0+cu112,python version:3.7.7

songzetao commented 2 years ago

flowvision 下载 MNIST 数据集出现 Segmentation fault (core dumped)

复现代码:

from flowvision import datasets
datasets.MNIST('.', train=True, download=True)

报错信息:

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz
9913344it [00:07, 1402622.21it/s]                                                             
Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw
Segmentation fault (core dumped)

运行环境: onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.7.0+cu112,python version:3.7.7

问题排查:

flowvision flowvision.datasets.utils.py 中的439、440行产生上述问题,代码为:

_COMPRESSED_FILE_OPENERS: Dict[str, Callable[..., IO]] = {
    ".gz": gzip.open,
    ".xz": lzma.open,
}
compressed_file_opener = _COMPRESSED_FILE_OPENERS[compression]
with compressed_file_opener(from_path) as rfh, open(to_path, "wb") as wfh:
        wfh.write(rfh.read())

torchvision 在 torchvision 中相应的代码为:

elif _is_gzip(from_path):
        to_path = os.path.join(to_path, os.path.splitext(os.path.basename(from_path))[0])
        with open(to_path, "wb") as out_f, gzip.GzipFile(from_path) as zip_f:
            out_f.write(zip_f.read())

区别在于 flowvision 使用gzip.open.read()而 torchvision 使用gzip.GzipFile.read()进行打开。

将 flowvision 中的 open 替换为 GzipFile 后问题依旧存在

Ldpe2G commented 2 years ago

我这边在23号开发机上测试是正常的,你的flowvision 是通过Pip 安装的是吧,试试git clone 安装

songzetao commented 2 years ago

我这边在23号开发机上测试是正常的,你的flowvision 是通过Pip 安装的是吧,试试git clone 安装

好嘞,我试一下。

songzetao commented 2 years ago

我这边在23号开发机上测试是正常的,你的flowvision 是通过Pip 安装的是吧,试试git clone 安装

目前我找到另一种解决方案是,更换python的版本。

conda create --name py36 python=3.6 -y
python3 -m pip install -f https://release.oneflow.info oneflow==0.7.0+cu112
pip install flowvision

即将 python 版本由 3.7 更改为 3.6 也可解决该问题。

我再试一下 git clone 安装

songzetao commented 2 years ago

oneflow 缺失 Adadelta() 优化器

复现代码:

import oneflow.optim as optim
optimizer = optim.Adadelta()

报错信息:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_22359/567125676.py in <module>
      1 import oneflow.optim as optim
----> 2 optimizer = optim.Adadelta()

AttributeError: module 'oneflow.optim' has no attribute 'Adadelta'

运行环境:

onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.8.0+cu112,python version:3.7.7

MARD1NO commented 2 years ago

安装下最新nightly,这两天合并进去了

songzetao commented 2 years ago

安装下最新nightly,这两天合并进去了

好嘞,谢谢!

songzetao commented 2 years ago

oneflow 缺失 Dropout2d

复现代码

import oneflow.nn as nn
dropout = nn.Dropout2d(0.3)

报错信息

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_22606/2762091287.py in <module>
      1 import oneflow.nn as nn
----> 2 dropout = nn.Dropout2d(0.3)

AttributeError: module 'oneflow.nn' has no attribute 'Dropout2d'

运行环境:

onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.8.0+cu112,python version:3.7.7