Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
http://www.oneflow.org
Apache License 2.0
5.87k stars 667 forks source link

Aborted (core dumped) in `flow.nn.ConvTranspose1d/ConvTranspose2d/ConvTranspose3d` #10519

Open x0w3n opened 4 months ago

x0w3n commented 4 months ago

Summary

When input some boundary values to the parameters of ConvTranspose1d/ConvTranspose2d/ConvTranspose3d, it will trigger the crash

Code to reproduce bug

ConvTranspose1d:

import oneflow as flow
flow.nn.ConvTranspose1d(-10,-10,kernel_size=-1,stride=-1)

output:

terminate called after throwing an instance of 'oneflow::Exception'
  what():  Check failed: (-100 >= 0): elem_cnt must be non-negative, but got -100
  File "liboneflow.so", line <unknown>, in 
  File "liboneflow.so", line <unknown>, in 
  File "liboneflow.so", line <unknown>, in vm::ThreadCtx::TryReceiveAndRun()
  File "liboneflow.so", line <unknown>, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
  File "liboneflow.so", line <unknown>, in vm::Instruction::Compute()
  File "liboneflow.so", line <unknown>, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
  File "liboneflow.so", line <unknown>, in 
  File "liboneflow.so", line <unknown>, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
  File "liboneflow.so", line <unknown>, in StatefulOpKernel::Compute(eager::CallContext*, ep::Stream*, user_op::OpKernel const*, user_op::OpKernelState*, user_op::OpKernelCache const*) const
  File "liboneflow.so", line <unknown>, in 
  File "oneflow/user/kernels/distributions/uniform_distribution.cpp", line 40, in operator()
    CHECK_GE_OR_THROW(elem_cnt, 0)
Error Type: oneflow.ErrorProto.check_failed_error
Stack trace (most recent call last) in thread 215062:
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1574249, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1573847, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff156f368, in vm::ThreadCtx::TryReceiveAndRun()
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1506818, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff150abe6, in vm::Instruction::Compute()
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1510d2a, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff15104af, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1515d90, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff22cf2d0, in StatefulOpKernel::Compute(eager::CallContext*, ep::Stream*, user_op::OpKernel const*, user_op::OpKernelState*, user_op::OpKernelCache const*) const
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1a21633, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff19ffa81, in UniformDistribution<(DeviceType)1, float>::operator()(ep::Stream*, long, float*, std::shared_ptr<Generator> const&) const
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff19fc11e, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7fffed910190, in 

Aborted (Signal sent by tkill() 214819 0)
Aborted (core dumped)

ConvTranspose2d:

import oneflow as flow
flow.nn.ConvTranspose2d(16,-33,  kernel_size=-100,stride=-1, padding=-1, dilation=-1)

output:

Traceback (most recent call last):
  File "/home/oneflow/test.py", line 55, in <module>
    flow.nn.ConvTranspose2d(16,-33,  kernel_size=-100,stride=-1, padding=-1, dilation=-1)
  File "/home/temp/oneflow-1.0.0/python/oneflow/nn/modules/conv.py", line 945, in __init__
    self.reset_parameters()
  File "/home/temp/oneflow-1.0.0/python/oneflow/nn/modules/conv.py", line 948, in reset_parameters
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
  File "/home/temp/oneflow-1.0.0/python/oneflow/nn/init.py", line 208, in kaiming_uniform_
    std = gain / math.sqrt(fan)
ValueError: math domain error
terminate called after throwing an instance of 'oneflow::RuntimeException'
  what():  Error: CPU can't allocate memory. Tried to allocate 17179869184.0 GB
You can set ONEFLOW_DEBUG or ONEFLOW_PYTHON_STACK_GETTER to 1 to get the Python stack of the error.
Stack trace (most recent call last) in thread 215425:
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1574249, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1573847, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff156f368, in vm::ThreadCtx::TryReceiveAndRun()
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1506818, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff150abe6, in vm::Instruction::Compute()
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1510d2a, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1510a1f, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff150d182, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7fffed910082, in 

Aborted (Signal sent by tkill() 215184 0)
Aborted (core dumped)

ConvTranspose3d:

import oneflow as flow
flow.nn.ConvTranspose3d(16,-33,  kernel_size=-100,stride=-1, padding=-1, dilation=-1)

output:

terminate called after throwing an instance of 'oneflow::Exception'
  what():  Check failed: (-33 >= 0): elem_cnt must be non-negative, but got -33
  File "liboneflow.so", line <unknown>, in 
  File "liboneflow.so", line <unknown>, in 
  File "liboneflow.so", line <unknown>, in vm::ThreadCtx::TryReceiveAndRun()
  File "liboneflow.so", line <unknown>, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
  File "liboneflow.so", line <unknown>, in vm::Instruction::Compute()
  File "liboneflow.so", line <unknown>, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
  File "liboneflow.so", line <unknown>, in 
  File "liboneflow.so", line <unknown>, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
  File "liboneflow.so", line <unknown>, in StatefulOpKernel::Compute(eager::CallContext*, ep::Stream*, user_op::OpKernel const*, user_op::OpKernelState*, user_op::OpKernelCache const*) const
  File "liboneflow.so", line <unknown>, in 
  File "oneflow/user/kernels/distributions/uniform_distribution.cpp", line 40, in operator()
    CHECK_GE_OR_THROW(elem_cnt, 0)
Error Type: oneflow.ErrorProto.check_failed_error
Stack trace (most recent call last) in thread 215821:
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1574249, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1573847, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff156f368, in vm::ThreadCtx::TryReceiveAndRun()
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1506818, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff150abe6, in vm::Instruction::Compute()
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1510d2a, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff15104af, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1515d90, in vm::OpCallInstructionUtil::Compute(vm::OpCallInstructionPolicy*, vm::Stream*, bool, bool)
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff22cf2d0, in StatefulOpKernel::Compute(eager::CallContext*, ep::Stream*, user_op::OpKernel const*, user_op::OpKernelState*, user_op::OpKernelCache const*) const
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff1a21633, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff19ffa81, in UniformDistribution<(DeviceType)1, float>::operator()(ep::Stream*, long, float*, std::shared_ptr<Generator> const&) const
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7ffff19fc11e, in 
   Object "/home/temp/oneflow-1.0.0/build/liboneflow.so", at 0x7fffed910190, in 

Aborted (Signal sent by tkill() 215582 0)
Aborted (core dumped)

System Information