facebookresearch / InterHand2.6M

Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020
Other
676 stars 92 forks source link

Ram usage and dataset implementation (30FPS version) #60

Closed pablovela5620 closed 3 years ago

pablovela5620 commented 3 years ago

The current implementation of the interhand dataset looks to load the entire set of annotations at once into system memory

from pycocotools.coco import COCO

db = COCO(osp.join(self.annot_path, self.annot_subset, 'InterHand2.6M_' + self.mode + '_data.json'))        
with open(osp.join(self.annot_path, self.annot_subset, 'InterHand2.6M_' + self.mode + '_camera.json')) as f:
            cameras = json.load(f)        
with open(osp.join(self.annot_path, self.annot_subset, 'InterHand2.6M_' + self.mode + '_joint_3d.json')) as f:
            joints = json.load(f)

with the 5FPS version, using 2 GPUs and a single worker, on my machine this uses ~100GB of system ram, when adding the MANO annotations this leads to out-of-memory issues.

My guess is that for the 30FPS version (which seems to be about 10x bigger ~2 million vs ~12 million total frames) this becomes completely infeasible.

Has any work been done for streaming the data rather than loading it all at once? If so some guidance on this would be great. I have seen some frameworks implement something like this (HuggingFace/webdataset/activeloop).

Knowing how you guys dealt with the massive size of this dataset would be greatly appreciated! Thanks so much

mks0601 commented 3 years ago

I have used InterHand 5fps dataset on my machine, which has a similar spec with yours. Do you meet the problem when loading MANO parameters? When did you download MANO parmaeters? The size of MANO parameters was significantly reduced in 2020.11.26 verison.

pablovela5620 commented 3 years ago

I haven't tried with this specific implementation recently (I was actually using mmpose which similarly loads the dataset all at once but allows to easily add different backbones/heads/augmentation pipelines/etc... you can see the issue I raised here), I can try again here with the most recent MANO version, basically, I was able to load and train but only having the trainset in memory.

You haven't had the opportunity with the 30fps version? I'm looking specifically for that larger version for some video-based research but with my current RAM capacity and implementation that seems impossible, which is why I was curious as to how you handled that version of the dataset

mks0601 commented 3 years ago

I see. I haven't tried the 30fps version yet. One possible way is to reduce the number of views as there are many views in this dataset (about 100 views).

pablovela5620 commented 3 years ago

I'll give this a try, appreciate the help