Hi, I noticed that the inputs for DetectNet include the Interhand2.6M dataset and the MSCOCO dataset. However, the Interhand2.6M dataset mainly consists of hand images, while the MSCOCO dataset, after preprocessing, contains images of the entire body. There is a significant difference between these two input datasets. Does this imply that when training DetectNet for handbox, it heavily relies on the MSCOCO dataset?
Hi, I noticed that the inputs for DetectNet include the Interhand2.6M dataset and the MSCOCO dataset. However, the Interhand2.6M dataset mainly consists of hand images, while the MSCOCO dataset, after preprocessing, contains images of the entire body. There is a significant difference between these two input datasets. Does this imply that when training DetectNet for handbox, it heavily relies on the MSCOCO dataset?