Oneflow-Inc / models

Models and examples built with OneFlow
Apache License 2.0
94 stars 37 forks source link

MetaKD问题记录 #378

Open songzetao opened 2 years ago

songzetao commented 2 years ago

torch 版本from torch.optim.optimizer import required修改为 oneflow 版本时候,发现 oneflow 的 optimizer 需要以此导入from oneflow.nn.optimizer.optimizer import required ,仅以记录。

songzetao commented 2 years ago

module 'oneflow.distributed' has no attribute 'is_initialized'

复现代码

import oneflow as torch
if torch.distributed.is_initialized():
    pass

报错信息

Traceback (most recent call last):
  File "meta_teacher_train.py", line 20, in <module>
    initialize_easynlp()
  File "/workspace/models/KnowledgeDistillation/knowledge_distillation_metakd/metakd_oneflow/easynlp/utils/initializer.py", line 39, in initialize_easynlp
    _initialize_distributed()
  File "/workspace/models/KnowledgeDistillation/knowledge_distillation_metakd/metakd_oneflow/easynlp/utils/initializer.py", line 109, in _initialize_distributed
    if torch.distributed.is_initialized():
AttributeError: module 'oneflow.distributed' has no attribute 'is_initialized'

运行环境

onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.8.1+cu112(nightly),python version:3.7.7

songzetao commented 2 years ago

'Tensor' object has no attribute 'is_sparse'

复现代码

import oneflow as flow
tensor = flow.randn(2, 3)
print(tensor.is_sparse)

报错信息

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_54640/2438940525.py in <module>
      1 import oneflow as flow
      2 tensor = flow.randn(2, 3)
----> 3 print(tensor.is_sparse)

AttributeError: 'Tensor' object has no attribute 'is_sparse'

对比 torch

import torch as flow
tensor = flow.randn(2, 3)
print(tensor.is_sparse)
>>False

运行环境

onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.8.1+cu112(nightly),python version:3.7.7

songzetao commented 2 years ago

对 0 维 tensor 进行 mean(-1) 操作, oneflow 会程序崩溃而 torch 不会

复现代码

import oneflow as flow
input = flow.randn(2, 3)
target = flow.randn(2, 3)
loss = flow.nn.functional.mse_loss(input, target) # 求loss
print(loss)
print(loss.shape)
loss_mean = loss.mean(-1)
print(loss_mean)

报错信息

loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info. View Jupyter [log](command:jupyter.viewOutput) for further details.

对比 torch

import torch as flow
input = flow.randn(2, 3)
target = flow.randn(2, 3)
loss = flow.nn.functional.mse_loss(input, target)
print(loss)
print(loss.shape)
loss_mean = loss.mean(-1)
print(loss_mean)
>>> tensor(1.3708)
>>> torch.Size([])
>>> tensor(1.3708)

运行环境

onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.8.1+cu112(nightly),python version:3.7.7

Flowingsun007 commented 2 years ago

对 0 维 tensor 进行 mean(-1) 操作, oneflow 会程序崩溃而 torch 不会

复现代码

import oneflow as flow
input = flow.randn(2, 3)
target = flow.randn(2, 3)
loss = flow.nn.functional.mse_loss(input, target) # 求loss
print(loss)
print(loss.shape)
loss_mean = loss.mean(-1)
print(loss_mean)

报错信息

loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info. View Jupyter [log](command:jupyter.viewOutput) for further details.

对比 torch

import torch as flow
input = flow.randn(2, 3)
target = flow.randn(2, 3)
loss = flow.nn.functional.mse_loss(input, target)
print(loss)
print(loss.shape)
loss_mean = loss.mean(-1)
print(loss_mean)
>>> tensor(1.3708)
>>> torch.Size([])
>>> tensor(1.3708)

运行环境

onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.8.1+cu112(nightly),python version:3.7.7

这个bug应该是oneflow/core/functional/impl/common.cpp 里的CheckAxis对0-dim的判断有点问题,我认领一下,后面提个pr