Closed ajtao closed 8 months ago
Your batch is empty. Your detection file is missing lots of frames, I am assuming that the error is related to that.
Maybe use the provided the detection files in the repo to double check if you can run the code base first.
Best, Orcun
In this case, i was intending to track a subset of frames, i.e. those indicated by the detection file. I have an application where i'd like to split a long video into segments and run tracking on each segment individually. Each segment has its own detection file, like the one i attached, which refers to a subset of the frames from the original video.
So i thought if segment 0 referred to frames 0-100 and segment 1 refers to frames 101-200, and so on, that that would be reasonable? Or does each segment need to start numbered at zero?
I've definitely run SUSHI successfully on other data and have confirmed good results with TrackEval, so i'm pretty sure i've gotten the code set up correctly.
I'd really like to figure this "list index out of range" problem
Here's an example of a failure, i'm not sure if this provides any clues as to what's going on here. This results in an empty object being passed to Batch.from_data_list().
`> SUSHI-vid/src/tracker/hicl_tracker.py(200)_hicl_to_curr()
(Pdb) batch
GraphBatch(x=[11, 2048], edge_index=[2, 0], x_reid=[11, 2048], y_id=[11, 1], fwrd_vel=[11, 4], bwrd_vel=[11, 4], x_frame_start=[11, 1], x_frame_end=[\11, 1], x_center_start=[11, 4], x_center_end=[11, 4], x_box_start=[11, 4], x_box_end=[11, 4], x_feet_start=[11, 2], x_feet_end=[11, 2], pruning_score\=[0], x_fwrd_motion=[11, 64, 4], x_bwrd_motion=[11, 64, 4], x_ignore_traj=[11], batch=[11], ptr=[2])
(Pdb) hicl_graphs [HierarchicalGraph(curr_depth=[1], maps=[6], x_reid=[589, 2048], x_node=[589, 2048], x_frame=[589, 1], x_bbox=[589, 4], x_feet=[589, 2], x_center=[58\9, 2], y_id=[589, 1], fps=[1], frames_total=[1], frames_per_level=[9], start_frame=[1], end_frame=[1], x_one_hot_frame=[589, 512], map_from_init=[589])] `
@ocetintas I'm sure this is something pretty basic, but I'm not able to figure out what's going on here.
As you can see your edge index dimensions are [2, 0]. This means that there is no edge in your graph. Please double check graph construction and edge construction steps for your specific application.
yolox_play6.csv I really don't know how to debug graph construction. Do you have any clues that you can share with me on how to debug this?
I'm just going to share two more things. I've attached the raw detections for this sequence and also i've printed out the state of the batch within _hicl_to_curr() across multiple calls to _hicl_to_curr(). The batch looks non-zero until the last iteration shown.
`batch GraphBatch(x=[589, 2048], edge_index=[2, 2891], x_reid=[589, 2048], y_id=[589, 1], x_frame_start=[589, 1], x_frame_end=[589, 1], x_center_start=[589, 4], x_center_end=[589, 4], x_box_start=[589, 4], x_box_end=[589, 4], x_feet_start=[589, 2], x_feet_end=[589, 2], pruning_score=[2891], batch=[589], ptr=[2])
batch GraphBatch(x=[303, 2048], edge_index=[2, 1531], x_reid=[303, 2048], y_id=[303, 1], fwrd_vel=[286, 4], bwrd_vel=[286, 4], x_frame_start=[303, 1], x_frame_end=[303, 1], x_center_start=[303, 4], x_center_end=[303, 4], x_box_start=[303, 4], x_box_end=[303, 4], x_feet_start=[303, 2], x_feet_end=[303, 2], pruning_score=[1531], x_fwrd_motion=[286, 2, 4], x_bwrd_motion=[286, 2, 4], x_ignore_traj=[303], batch=[303], ptr=[2])
batch GraphBatch(x=[153, 2048], edge_index=[2, 731], x_reid=[153, 2048], y_id=[153, 1], fwrd_vel=[151, 4], bwrd_vel=[151, 4], x_frame_start=[153, 1], x_frame_end=[153, 1], x_center_start=[153, 4], x_center_end=[153, 4], x_box_start=[153, 4], x_box_end=[153, 4], x_feet_start=[153, 2], x_feet_end=[153, 2], pruning_score=[731], x_fwrd_motion=[151, 4, 4], x_bwrd_motion=[151, 4, 4], x_ignore_traj=[153], batch=[153], ptr=[2])
batch GraphBatch(x=[82, 2048], edge_index=[2, 420], x_reid=[82, 2048], y_id=[82, 1], fwrd_vel=[82, 4], bwrd_vel=[82, 4], x_frame_start=[82, 1], x_frame_end=[82, 1], x_center_start=[82, 4], x_center_end=[82, 4], x_box_start=[82, 4], x_box_end=[82, 4], x_feet_start=[82, 2], x_feet_end=[82, 2], pruning_score=[420], x_fwrd_motion=[82, 8, 4], x_bwrd_motion=[82, 8, 4], x_ignore_traj=[82], batch=[82], ptr=[2])
batch GraphBatch(x=[42, 2048], edge_index=[2, 221], x_reid=[42, 2048], y_id=[42, 1], fwrd_vel=[42, 4], bwrd_vel=[42, 4], x_frame_start=[42, 1], x_frame_end=[42, 1], x_center_start=[42, 4], x_center_end=[42, 4], x_box_start=[42, 4], x_box_end=[42, 4], x_feet_start=[42, 2], x_feet_end=[42, 2], pruning_score=[221], x_fwrd_motion=[42, 16, 4], x_bwrd_motion=[42, 16, 4], x_ignore_traj=[42], batch=[42], ptr=[2])
batch GraphBatch(x=[21, 2048], edge_index=[2, 110], x_reid=[21, 2048], y_id=[21, 1], fwrd_vel=[21, 4], bwrd_vel=[21, 4], x_frame_start=[21, 1], x_frame_end=[21, 1], x_center_start=[21, 4], x_center_end=[21, 4], x_box_start=[21, 4], x_box_end=[21, 4], x_feet_start=[21, 2], x_feet_end=[21, 2], pruning_score=[110], x_fwrd_motion=[21, 32, 4], x_bwrd_motion=[21, 32, 4], x_ignore_traj=[21], batch=[21], ptr=[2])
batch GraphBatch(x=[11, 2048], edge_index=[2, 0], x_reid=[11, 2048], y_id=[11, 1], fwrd_vel=[11, 4], bwrd_vel=[11, 4], x_frame_start=[11, 1], x_frame_end=[11, 1], x_center_start=[11, 4], x_center_end=[11, 4], x_box_start=[11, 4], x_box_end=[11, 4], x_feet_start=[11, 2], x_feet_end=[11, 2], pruning_score=[0], x_fwrd_motion=[11, 64, 4], x_bwrd_motion=[11, 64, 4], x_ignore_traj=[11], batch=[11], ptr=[2]) `
I believe i've figured this out.
In mot17.py, i was filtering the dataframe and i end up with a subset of the original dataframe. I do this because since i'm processing a sports match, in mot17, i isolate the full match's detection file to a single rally. However doing this filtering meant that my data frame did not start at index=1 as you would normally see. The downstream code appears to not like this for some reason.
So the fix for me was to insert this code: det_df = det_df.reset_index(drop=True). This re-numbers the index to start at 1.
Now I'm no longer getting the crash.
Unfortunately, the renumbering of the index didn't fix the issue across the board. I continue to get failures with some short tracks, usually ones ~60-70 frames long.
I'm attaching a full repro data sample for a failing case in case you'd be able to run this @ocetintas
data_list = [hicl_graph.construct_curr_graph_nodes(self.config) for hicl_graph in hicl_graphs if torch.unique(hicl_graph.map_from_init).shape[0] > 1]
if data_list:
batch = Batch.from_data_list(data_list)
curr_depth = hicl_graphs[0].curr_depth
if self.config.do_motion and curr_depth >0:
motion_pred = self.predict_motion(batch, curr_depth = curr_depth)
batch.pruning_score = compute_giou_fwrd_bwrd_motion_sim(batch, motion_pred)
if 'estimate_vel' in motion_pred[0]:
batch.fwrd_vel, batch.bwrd_vel = motion_pred[0]['estimate_vel'], motion_pred[1]['estimate_vel']
else:
motion_pred = None
# Now unbatch graphs, add their remaining features, and batch them again
curr_graphs = Batch.to_data_list(batch)
data_list = [hicl_graph.add_edges_to_curr_graph(self.config, curr_graph) for curr_graph, hicl_graph in zip(curr_graphs, hicl_graphs) if ((curr_graph.edge_index is not None) and (curr_graph.edge_index.numel()))]
if data_list:
curr_graph_batch = Batch.from_data_list(data_list)
else:
curr_graph_batch = None
else:
curr_graph_batch = None
motion_pred = None
return curr_graph_batch, motion_pred
Hi, I am assuming you have a very specific edge case in your graphs where none of your graphs in a batch has a tracking candidate. Please replace the hicl_to_curr function in https://github.com/dvl-tum/SUSHI/blob/main/src/tracker/hicl_tracker.py#L171 with the script given above and this should solve the problem
Oh and you also need to add an if condition checking if your curr_graph_batch is None under hicl_forward. An example of how to do this is as follows. (below code might not work for you, just adding it as a reference to where to add the if condition: if (curr_batch is not None) and curr_batch.edge_index.numel(): )
def hicl_forward(self, hicl_graphs, logs, oracle, mode, max_depth, project_max_depth, return_edge_specs=False):
hicl_feats=None
edge_specs = [{} for i in range(self.config.hicl_depth)] # {"Preds":, "GT": } for each level
loss = torch.as_tensor([.0], device=self.gpu_id) # Initialize the batch loss
# For each depth
for curr_depth in range(max_depth):
t0 = time.time()
# Put the graph into the correct format
curr_batch, _ = self._hicl_to_curr(hicl_graphs=hicl_graphs) # Create curr_graphs from hierarachical graphs
if (curr_batch is not None) and curr_batch.edge_index.numel():
batch_idx = curr_batch.batch
if curr_depth == 0 or not self.config.do_hicl_feats:
curr_batch.hicl_feats = None
elif hicl_feats is not None:
curr_batch.hicl_feats = hicl_feats
t1 = time.time()
# Forward pass if there is an edge
if oracle:
# Oracle results
curr_batch.edge_preds = curr_batch.edge_labels
logs[curr_depth]["Loss"].append(0.)
# Calculate batch classification metrics and loss
logs[curr_depth] = self._calculate_true_false_metrics(edge_preds=curr_batch.edge_preds,
edge_labels=curr_batch.edge_labels,
logs=logs[curr_depth])
else:
# Graph based forward pass
outputs = self.model(curr_batch, curr_depth) # Forward pass for this specific depth
# Hacky way to solve GPU usage problem
if self.config.dummy_gpu_usage:
self.dummy_model(self.dummy_tensor)
# Produce decisions
curr_batch.edge_preds = torch.sigmoid(outputs['classified_edges'][-1].view(-1).detach())
# curr_batch.edge_preds = curr_batch.edge_labels
if mode == 'val' or self.config.force_logs:
# Calculate the batch loss
logs[curr_depth]["Loss"].append(self._calculate_loss(outputs=outputs, edge_labels=curr_batch.edge_labels, edge_mask=curr_batch.edge_mask).item())
# Calculate batch classification metrics and loss
logs[curr_depth] = self._calculate_true_false_metrics(edge_preds=curr_batch.edge_preds,
edge_labels=curr_batch.edge_labels, logs=logs[curr_depth], decision_threshold=self.config.decision_threshold[curr_depth])
elif mode == 'train':
# Calculate loss and prepare for a forward pass
loss_curr_depth = self._calculate_loss(outputs=outputs, edge_labels=curr_batch.edge_labels, edge_mask=curr_batch.edge_mask)
loss_curr_depth.backward()
loss += loss_curr_depth
logs["Loss_per_Depth"][curr_depth].append(loss_curr_depth.detach().item()) # log the curr loss
if return_edge_specs:
edge_specs[curr_depth]["preds"] = curr_batch.edge_preds
edge_specs[curr_depth]["gt"] = curr_batch.edge_labels
edge_specs[curr_depth]["edge_feats"] = curr_batch.edge_attr
t2 = time.time()
graph_data_list = curr_batch.to_data_list()
if mode != 'train':
assert len(graph_data_list) == 1, "Track batch size is greater than 1"
hicl_feats = []
if curr_depth < project_max_depth: # Last layer update is not necessary for training
for ix_graph, graph in enumerate(graph_data_list):
if graph.edge_index.numel():
# Process the graph before feeding it to the projector
self._postprocess_graph(graph=graph, decision_threshold=self.config.decision_threshold[curr_depth])
# Project model output with a solver
graph = self._project_graph(graph)
# Assign ped ids
n_components, labels = self._assign_labels(graph)
node_mask = batch_idx == ix_graph
if self.config.do_hicl_feats and not oracle:
hicl_feats.append(self.model.layers[curr_depth].hicl_feats_encoder.pool_node_feats(outputs['node_feats'][node_mask], labels))
# Update the hierarchical graphs with new map_from_init and depth
hicl_graphs[ix_graph].update_maps_and_depth(labels)
else:
# Update the hierarchical graphs
hicl_graphs[ix_graph].update_maps_and_depth_wo_labels()
t3 = time.time()
if self.config.print_runtime_stats:
print("Setup graphs: ", t1-t0)
print("Forward pass: ", t2-t1)
print("Projection: ", t3-t2)
if len(hicl_feats) > 0:
hicl_feats = torch.cat(hicl_feats)
else:
hicl_feats =None
return hicl_graphs, loss, logs, edge_specs
Hi @ocetintas I'm getting an error with the first code snippet for _hicl_to_curr() because map_from_init is not an attribute of hicl_graph.
You can try replacing: https://github.com/dvl-tum/SUSHI/blob/main/src/data/graph.py#L94 with
if hasattr(self, 'x_node'):
self.map_from_init = torch.arange(self.x_node.shape[0]).long().to(self.device()) # From initial nodes to the latest layer
I think that did it, thanks very much for all the help @ocetintas!
I'm getting an error when running SUSHI:
I'm attaching my MOT detection file in case it might be useful for a repro.
yolox.txt