Why is it important to detach the tensor and stop gradient propagation in pairwise.py

Cadene / murel.bootstrap.pytorch

MUREL (CVPR 2019), a multimodal relational reasoning module for VQA

https://arxiv.org/abs/1902.09487

BSD 3-Clause "New" or "Revised" License

194 stars 24 forks source link

Closed yuweihao closed 5 years ago

yuweihao commented 5 years ago

Thanks for sharing the nice code!

My question is in MuRel Cell, why it is important to detach the tensor and stop propagation here.

Cadene commented 5 years ago

@yuweihao

We developed on pytorch0.3. We tried to port our code to pytorch0.4/1.1, but it was 3 times slower because of an issue with the indexing. We didnt have much time so we desactivated the gradients (detach)... Unfortunately it was a really bad idea. We just fixed it: https://github.com/Cadene/murel.bootstrap.pytorch/commit/7c9eaebfa6b0fe2565d97dac01001ea9e6ddae7b

yuweihao commented 5 years ago

Hi @Cadene

Thank you very much for your reply.