Open anilbatra2185 opened 2 months ago
Hi @anilbatra2185,
I appreciate your interest in our work.
When I did the training, I used an M2 disk, which made the process much faster. We used an RTX8000. The disk's reading speed is more critical than the GPU's.
I tried other options when I found the same issue on a different machine. I solved that problem using an H5 file with all the features in one array, and the indices refer to memory space rather than files. That could help, but I cannot find the code for that option.
Best
thanks @crodriguezo for your response.
I tried using H5, however, I am unable to save due to empty arrays in object features. Below is my code to save the features in H5, any suggestion to change it.
def load_obj_feat(self):
self.object_feats = {}
video_ids = list(set([(ann['video'],ann['subset'],ann['recipe']) for _, ann in self.anns.items()]))
# for idx, ann in tqdm(self.anns.items(), total=len(self.anns)):
for video_id, subset, recipe in tqdm(video_ids, total=len(video_ids)):
selected_frames = self.selected_frames[video_id]
object_features = []
human_features = []
for selected in selected_frames:
file_path = os.path.join(self.obj_feat_path, subset, recipe, video_id, "{}_{}.pkl".format("image", str(selected).zfill(5)))
aux_obj = []
aux_hum = []
with open(file_path, "rb") as fo:
obj_feat = pickle.load(fo, encoding='latin1')
# print(obj_feat.keys())
for indx, obj_type in enumerate(obj_feat['object_class']):
if self.mapping_obj[str(obj_type)]['human']:
aux_hum.append(obj_feat['features'][indx].astype(np.float16))
else:
aux_obj.append(obj_feat['features'][indx].astype(np.float16))
if len(aux_obj) == 0:
aux_obj = np.zeros((1, 2048), dtype=np.float16)
if len(aux_hum) == 0:
aux_hum = np.zeros((1, 2048), dtype=np.float16)
aux_obj = np.array(aux_obj, dtype=np.float16)
aux_hum = np.array(aux_hum, dtype=np.float16)
object_features.append(aux_obj)
human_features.append(aux_hum)
self.object_feats[video_id] = (object_features, human_features)
with h5py.File("dori_faster_rcnn_obj_feats.h5", 'w') as f:
for vid, (obj_feat, human_feat) in tqdm(self.object_feats.items(), total=len(self.object_feats)):
f.create_dataset(f"{vid}", data=(obj_feat, human_feat))
Currently, I load the object features in-memory and epoch time is reduced to 45 minutes by reducing the precision of object features to float16
. However, I wonder if you have any ablations on the number of object features per frame i.e. if we consider only top 5 objects per frame. Also, reducing the number of frames to save the memory has any effect on the performance.
Thanks Anil
Hi @anilbatra2185,
I don't know what dataset you are working on, but the number of objects depends on the type of video. I recall that some frames are just black frames because of transitions to other instructions (youcookII) or the beginning/ending of an action (ActivityNet). However, I also recall that we set a maximum number of objects per keyframe "We extract the top 15 objects detected in terms of confidence for each key-frames using Faster-RCNN." Section 5.1 https://openaccess.thecvf.com/content/WACV2021/papers/Rodriguez-Opazo_DORi_Discovering_Object_Relationships_for_Moment_Localization_of_a_Natural_WACV_2021_paper.pdf
I am re-running the LocFormer(SBFS) experiments with DORi before sharing the code with you. They take 15 minutes per epoch on a Quadro P5000, using independent files (not h5) for the objects and a batch size of 6. However, I am using a M2. (I will share that code ASAP just cleaning the code from other ideas that I never finished)
For training time and inference statistics, see Table 5: https://aclanthology.org/2023.eacl-main.140.pdf.
I am working on YouCook2 at the moment. Appreciate you help and efforts!
I will wait for your code sharing of SBFS.
Thanks Anil
hi @crodriguezo,
Can you share some details about the training details such as how long it will take, what is hardware/GPU was used?
Currently, on A100-80GB (24 CPUs), the training is too slow with a batch size of 4 i.e. it is taking 6 hours for 1 epoch. My concerns is reading the object features. Any suggestion to speed-up the training.
Regards