Closed AssafSinger94 closed 1 year ago
Hi @AssafSinger94,
I also trained this model on A100 GPUs. I think the problem with the pickle files is RAM, not GPU memory. You should be able to replace the following code:
class TapVidDataset(torch.utils.data.Dataset):
def __init__(self, ...):
...
if self.dataset_type == "kinetics":
all_paths = glob.glob(os.path.join(data_root, "*_of_0010.pkl"))
points_dataset = []
for pickle_path in all_paths:
with open(pickle_path, "rb") as f:
data = pickle.load(f)
points_dataset = points_dataset + data
self.points_dataset = points_dataset
def __getitem__(self, index):
if self.dataset_type == "davis":
...
else:
video_name = index
video = self.points_dataset[video_name]
by something like this. In this case, pickle files are loaded only when necessary.
class TapVidDataset(torch.utils.data.Dataset):
def __init__(self, ...):
...
if self.dataset_type == "kinetics":
self.all_paths = glob.glob(os.path.join(data_root, "*_of_0010.pkl"))
self.curr_path_idx = -1
self.global_file_idx = 0
def load_pickle_file(self):
with open(self.all_paths[self.curr_path_idx], "rb") as f:
data = pickle.load(f)
return data
def __getitem__(self, index):
if self.dataset_type == "davis":
...
else:
if index >= len(self.points_dataset):
self.global_file_idx+=len(self.points_dataset)
self.curr_path_idx+=1
self.points_dataset = self.load_pickle_file()
index -= self.global_file_idx
video_name = index
video = self.points_dataset[video_name]
...
def __len__(self):
...
I haven't tested it though. Please let me know if this solution works! You'll also need to implement def __len__(self)
, maybe just by hardcoding the dataset length or loading all the files one by one in the same manner.
Hi @nikitakaraevv , thank you for your help! it really helped. I made a few small adjustments (code added below), and the evaluation ran properly.
However, my overall average metrics results are much lower then reported in the paper. as follows:
{'occlusion_accuracy': 0.8116, 'pts_within_1': 0.2009, 'jaccard_1': 0.1205, 'pts_within_2': 0.3077, 'jaccard_2': 0.1987, 'pts_within_4': 0.4183, 'jaccard_4': 0.2826, 'pts_within_8': 0.5321, 'jaccard_8': 0.3684, 'pts_within_16': 0.6535, 'jaccard_16': 0.4621, 'average_jaccard': 0.2865, 'average_pts_within_thresh': 0.4225}
Could you please assist me in the matter? Am I missing something?
In addition, inference for each video takes 1-2 minutes on a GPU, and the evaluation on the entire dataset takes over 14 hours. Is there any way to speed up the inference? Does it makes sense that the inference time is so long? Currently I adjusted the code to run for a specific video_ind, sent many separate running jobs and average their results. Looking at prediction visualization shows that video index separation works well, and predicted trajectories seem to "make sense".
Thank you for your help! Assaf
Adjusted code:
class TapVidDataset(torch.utils.data.Dataset):
def __init__(self, ...):
...
if self.dataset_type == "kinetics":
self.all_paths = glob.glob(os.path.join(data_root, "*_of_0010.pkl"))
self.curr_path_idx = -1
self.global_file_idx = 0
self.points_dataset = [] # initialize to empty list
def __getitem__(self, index):
if self.dataset_type == "davis":
...
else: # kinetics
pkl_index = index - self.global_file_idx # index within the current pickle file
if pkl_index >= len(self.points_dataset):
self.global_file_idx+=len(self.points_dataset)
self.curr_path_idx+=1
self.points_dataset = self.load_pickle_file()
pkl_index = index - self.global_file_idx # index within the new pickle file
video_name = pkl_index
def __len__(self):
if self.dataset_type == "kinetics":
return 1144
Hi @AssafSinger94, did you evaluate the model on TAP-Vid Davis to ensure that the numbers match? Also, what are the average metrics after the first sequence and after the first five sequences on Kinetics? Mine are:
'occlusion_accuracy': 0.8893333333333333,
'average_jaccard': 0.3283059618231899,
'average_pts_within_thresh': 0.4515646635281086
and after the first five sequences:
'occlusion_accuracy': 0.8990976751255385,
'average_jaccard': 0.44782753486523796,
'average_pts_within_thresh': 0.5723796905496629
Hey, the low metrics results where caused by an issue I had when creating the dataset (caused by a bad frame rate sampling), after I fixed it, results made more sense. thanks!
Hi, When trying to evaluate the model on TAP-Vid Kinetics with 'first' sampling, my GPU is reaching a memory limit and crashes. The error occurs when trying to aggregate the TAP-Vid Kinetics pickle files during the instantiation of the
TapVidDataset
object. I was able to evaluate the model on TAP-Vid DAVIS properly. I am running the following code on an NVIDIA-A100 GPU (the GPU with the most memory that I have access to).Could you please assist me in the matter? Are you able to provide the code you used to evaluate the model in Kinetics? What GPU were you using? In the TAP-Vid repo they provide create_kinetics_dataset which returns an iterable that yields a video example each time, but I couldn't adjust the code properly to use this iterable instead.
Thank you for your help! Assaf