Open Asia10086 opened 2 months ago
Due to my limited ability, I cannot repeat the result
+1
Thanks for your interest in our multimodal action recognition work. Of course, I will share the code here soon (within 2 weeks).
Also, Here are some hints that may help you: For the multimodal network, i performed the following:
a) preprocess the video clip and its corresponding skeletons to make them spatially aligned by cropping video clip frames according to the minimum bounding box involving all 2D poses of the subject. b) implement a dataset class for the multimodal network to feed the dataloader with RGB frames and skeleton heatmap volume. c) implement the multimodal network with a spatial-temporal attention block d) using the distributed training in Pytorch (DDP) to use multiple GPUs or multiple nodes.
Due to my limited ability, I cannot repeat the result