PKUFlyingPig / CMU10-714

Learning material for CMU10-714: Deep Learning System
214 stars 34 forks source link

About conv operation in HW4 #6

Open DeliMm opened 5 months ago

DeliMm commented 5 months ago

Maybe there is a mistake in the conv's gradient function with the following place https://github.com/PKUFlyingPig/CMU10-714/blob/99180c4ad3986f64e09c4dbe15ff1fb2c113ce63/homework/hw4/python/needle/ops.py#L613

I have a test case with :

When I test the above case, there are some errors with

tests/hw4/test_conv.py::test_op_conv[backward-needle.backend_ndarray.ndarray_backend_cuda-Z_shape16-W_shape16-1-0]  
- ValueError: operands could not be broadcast together with shapes (1,16,16,1) (1,17,17,1)

This function may not have considered the situation of a large convolution kernel during implementation, especially when $(H - 2p + k) \pmod s \ne 0$.

May I ask if it‘s possible to check such case and carry out improvement/perfection? Thanks for your reply.

Skylight-Lark commented 2 months ago

Maybe there is a mistake in the conv's gradient function with the following place

https://github.com/PKUFlyingPig/CMU10-714/blob/99180c4ad3986f64e09c4dbe15ff1fb2c113ce63/homework/hw4/python/needle/ops.py#L613

I have a test case with :

  • A's shape: (1, 16, 16, 1)
  • B's shape: (7, 7, 1, 1)
  • stride=3 & padding=2

When I test the above case, there are some errors with

tests/hw4/test_conv.py::test_op_conv[backward-needle.backend_ndarray.ndarray_backend_cuda-Z_shape16-W_shape16-1-0]  
- ValueError: operands could not be broadcast together with shapes (1,16,16,1) (1,17,17,1)

This function may not have considered the situation of a large convolution kernel during implementation, especially when (H−2p+k)(mods)≠0.

May I ask if it‘s possible to check such case and carry out improvement/perfection? Thanks for your reply.

@PKUFlyingPig @DeliMm Hello! I think the caculation of OFM's size is wrong: https://github.com/PKUFlyingPig/CMU10-714/blob/99180c4ad3986f64e09c4dbe15ff1fb2c113ce63/homework/hw4/python/needle/ops.py#L595 As the pytorch doc says: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d The right size is: (H + 2 P - K) // self.stride + 1* So the caculation of conv gradient has also some problems, like X.grad conv padding