Open songzetao opened 2 years ago
import oneflow as torch
if torch.distributed.is_initialized():
pass
Traceback (most recent call last):
File "meta_teacher_train.py", line 20, in <module>
initialize_easynlp()
File "/workspace/models/KnowledgeDistillation/knowledge_distillation_metakd/metakd_oneflow/easynlp/utils/initializer.py", line 39, in initialize_easynlp
_initialize_distributed()
File "/workspace/models/KnowledgeDistillation/knowledge_distillation_metakd/metakd_oneflow/easynlp/utils/initializer.py", line 109, in _initialize_distributed
if torch.distributed.is_initialized():
AttributeError: module 'oneflow.distributed' has no attribute 'is_initialized'
onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.8.1+cu112(nightly),python version:3.7.7
import oneflow as flow
tensor = flow.randn(2, 3)
print(tensor.is_sparse)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_54640/2438940525.py in <module>
1 import oneflow as flow
2 tensor = flow.randn(2, 3)
----> 3 print(tensor.is_sparse)
AttributeError: 'Tensor' object has no attribute 'is_sparse'
import torch as flow
tensor = flow.randn(2, 3)
print(tensor.is_sparse)
>>False
onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.8.1+cu112(nightly),python version:3.7.7
import oneflow as flow
input = flow.randn(2, 3)
target = flow.randn(2, 3)
loss = flow.nn.functional.mse_loss(input, target) # 求loss
print(loss)
print(loss.shape)
loss_mean = loss.mean(-1)
print(loss_mean)
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info. View Jupyter [log](command:jupyter.viewOutput) for further details.
import torch as flow
input = flow.randn(2, 3)
target = flow.randn(2, 3)
loss = flow.nn.functional.mse_loss(input, target)
print(loss)
print(loss.shape)
loss_mean = loss.mean(-1)
print(loss_mean)
>>> tensor(1.3708)
>>> torch.Size([])
>>> tensor(1.3708)
onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.8.1+cu112(nightly),python version:3.7.7
对 0 维 tensor 进行 mean(-1) 操作, oneflow 会程序崩溃而 torch 不会
复现代码
import oneflow as flow input = flow.randn(2, 3) target = flow.randn(2, 3) loss = flow.nn.functional.mse_loss(input, target) # 求loss print(loss) print(loss.shape) loss_mean = loss.mean(-1) print(loss_mean)
报错信息
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1 Canceled future for execute_request message before replies were done The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info. View Jupyter [log](command:jupyter.viewOutput) for further details.
对比 torch
import torch as flow input = flow.randn(2, 3) target = flow.randn(2, 3) loss = flow.nn.functional.mse_loss(input, target) print(loss) print(loss.shape) loss_mean = loss.mean(-1) print(loss_mean) >>> tensor(1.3708) >>> torch.Size([]) >>> tensor(1.3708)
运行环境
onecloud平台,4core-14Gi-P40(1Card)机器。oneflow version: 0.8.1+cu112(nightly),python version:3.7.7
这个bug应该是oneflow/core/functional/impl/common.cpp
里的CheckAxis对0-dim的判断有点问题,我认领一下,后面提个pr
torch 版本
from torch.optim.optimizer import required
修改为 oneflow 版本时候,发现 oneflow 的 optimizer 需要以此导入from oneflow.nn.optimizer.optimizer import required
,仅以记录。