Open as1392 opened 5 years ago
On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code is available at https://github.com/facebookresearch/video-nonlocal-net.
The authors of [7] have shown that I3D models are more accurate than their CNN+LSTM counterparts.
Hi @AlexeyAB , I want to use non-local block, how to do it. I think darknet lack the matrix operator like transpose, or st like pytorch or numpy
@CuongNguyen218 Do you need just one transpose-layer (exchange weidth <-> height) or some other layers for non-local block?
@AlexeyAB , Yes, at that time, i just want transpose layer and dot product operator
https://arxiv.org/pdf/1811.11721.pdf https://github.com/speedinghzl/CCNet Well, CCNet claims it is better than non-local blocks(mAP, FLOPS, memory) and it has pytorch implementations. CCNet could be more worth implemented.
https://arxiv.org/pdf/1711.07971.pdf It improves >1% mAP for mask r-cnn, even with backbone X-152. It seems worth to implement.