list index out of range

ajtao commented 8 months ago

I'm getting an error when running SUSHI:

Traceback (most recent call last):
  File "/home/atao/devel/unified-dev/SUSHI/scripts/main.py", line 65, in <module>
    epoch_val_logs, epoc_val_logs_per_depth = hicl_tracker.track(dataset=hicl_tracker.test_dataset, output_path=osp.join(hicl_tracker.config.experiment_path, 'test'),
  File "/home/atao/devel/unified-dev/SUSHI/src/tracker/hicl_tracker.py", line 551, in track
    hicl_graphs, _, logs_all = self.hicl_forward(hicl_graphs = hicl_graphs, 
  File "/home/atao/devel/unified-dev/SUSHI/src/tracker/hicl_tracker.py", line 333, in hicl_forward
    curr_batch, _ = self._hicl_to_curr(hicl_graphs=hicl_graphs)  # Create curr_graphs from hierarachical graphs            
  File "/home/atao/devel/unified-dev/SUSHI/src/tracker/hicl_tracker.py", line 195, in _hicl_to_curr
    curr_graph_batch = Batch.from_data_list([hicl_graph.add_edges_to_curr_graph(self.config, curr_graph)
  File "/home/atao/.miniconda/envs/SUSHI/lib/python3.9/site-packages/torch_geometric/data/batch.py", line 97, in from_data_list
    add_batch=not isinstance(data_list[0], Batch),
IndexError: list index out of range

I'm attaching my MOT detection file in case it might be useful for a repro.

yolox.txt

ocetintas commented 8 months ago

Your batch is empty. Your detection file is missing lots of frames, I am assuming that the error is related to that.

Maybe use the provided the detection files in the repo to double check if you can run the code base first.

Best, Orcun

ajtao commented 8 months ago

In this case, i was intending to track a subset of frames, i.e. those indicated by the detection file. I have an application where i'd like to split a long video into segments and run tracking on each segment individually. Each segment has its own detection file, like the one i attached, which refers to a subset of the frames from the original video.

So i thought if segment 0 referred to frames 0-100 and segment 1 refers to frames 101-200, and so on, that that would be reasonable? Or does each segment need to start numbered at zero?

I've definitely run SUSHI successfully on other data and have confirmed good results with TrackEval, so i'm pretty sure i've gotten the code set up correctly.

ajtao commented 4 months ago

I'd really like to figure this "list index out of range" problem

Here's an example of a failure, i'm not sure if this provides any clues as to what's going on here. This results in an empty object being passed to Batch.from_data_list().

`> SUSHI-vid/src/tracker/hicl_tracker.py(200)_hicl_to_curr()

(Pdb) batch
GraphBatch(x=[11, 2048], edge_index=[2, 0], x_reid=[11, 2048], y_id=[11, 1], fwrd_vel=[11, 4], bwrd_vel=[11, 4], x_frame_start=[11, 1], x_frame_end=[\11, 1], x_center_start=[11, 4], x_center_end=[11, 4], x_box_start=[11, 4], x_box_end=[11, 4], x_feet_start=[11, 2], x_feet_end=[11, 2], pruning_score\=[0], x_fwrd_motion=[11, 64, 4], x_bwrd_motion=[11, 64, 4], x_ignore_traj=[11], batch=[11], ptr=[2])

(Pdb) hicl_graphs [HierarchicalGraph(curr_depth=[1], maps=[6], x_reid=[589, 2048], x_node=[589, 2048], x_frame=[589, 1], x_bbox=[589, 4], x_feet=[589, 2], x_center=[58\9, 2], y_id=[589, 1], fps=[1], frames_total=[1], frames_per_level=[9], start_frame=[1], end_frame=[1], x_one_hot_frame=[589, 512], map_from_init=[589])] `

ajtao commented 4 months ago

@ocetintas I'm sure this is something pretty basic, but I'm not able to figure out what's going on here.

ocetintas commented 4 months ago

As you can see your edge index dimensions are [2, 0]. This means that there is no edge in your graph. Please double check graph construction and edge construction steps for your specific application.

ajtao commented 4 months ago

yolox_play6.csv I really don't know how to debug graph construction. Do you have any clues that you can share with me on how to debug this?

I'm just going to share two more things. I've attached the raw detections for this sequence and also i've printed out the state of the batch within _hicl_to_curr() across multiple calls to _hicl_to_curr(). The batch looks non-zero until the last iteration shown.

`batch GraphBatch(x=[589, 2048], edge_index=[2, 2891], x_reid=[589, 2048], y_id=[589, 1], x_frame_start=[589, 1], x_frame_end=[589, 1], x_center_start=[589, 4], x_center_end=[589, 4], x_box_start=[589, 4], x_box_end=[589, 4], x_feet_start=[589, 2], x_feet_end=[589, 2], pruning_score=[2891], batch=[589], ptr=[2])

batch GraphBatch(x=[303, 2048], edge_index=[2, 1531], x_reid=[303, 2048], y_id=[303, 1], fwrd_vel=[286, 4], bwrd_vel=[286, 4], x_frame_start=[303, 1], x_frame_end=[303, 1], x_center_start=[303, 4], x_center_end=[303, 4], x_box_start=[303, 4], x_box_end=[303, 4], x_feet_start=[303, 2], x_feet_end=[303, 2], pruning_score=[1531], x_fwrd_motion=[286, 2, 4], x_bwrd_motion=[286, 2, 4], x_ignore_traj=[303], batch=[303], ptr=[2])

batch GraphBatch(x=[153, 2048], edge_index=[2, 731], x_reid=[153, 2048], y_id=[153, 1], fwrd_vel=[151, 4], bwrd_vel=[151, 4], x_frame_start=[153, 1], x_frame_end=[153, 1], x_center_start=[153, 4], x_center_end=[153, 4], x_box_start=[153, 4], x_box_end=[153, 4], x_feet_start=[153, 2], x_feet_end=[153, 2], pruning_score=[731], x_fwrd_motion=[151, 4, 4], x_bwrd_motion=[151, 4, 4], x_ignore_traj=[153], batch=[153], ptr=[2])

batch GraphBatch(x=[82, 2048], edge_index=[2, 420], x_reid=[82, 2048], y_id=[82, 1], fwrd_vel=[82, 4], bwrd_vel=[82, 4], x_frame_start=[82, 1], x_frame_end=[82, 1], x_center_start=[82, 4], x_center_end=[82, 4], x_box_start=[82, 4], x_box_end=[82, 4], x_feet_start=[82, 2], x_feet_end=[82, 2], pruning_score=[420], x_fwrd_motion=[82, 8, 4], x_bwrd_motion=[82, 8, 4], x_ignore_traj=[82], batch=[82], ptr=[2])

batch GraphBatch(x=[42, 2048], edge_index=[2, 221], x_reid=[42, 2048], y_id=[42, 1], fwrd_vel=[42, 4], bwrd_vel=[42, 4], x_frame_start=[42, 1], x_frame_end=[42, 1], x_center_start=[42, 4], x_center_end=[42, 4], x_box_start=[42, 4], x_box_end=[42, 4], x_feet_start=[42, 2], x_feet_end=[42, 2], pruning_score=[221], x_fwrd_motion=[42, 16, 4], x_bwrd_motion=[42, 16, 4], x_ignore_traj=[42], batch=[42], ptr=[2])

batch GraphBatch(x=[21, 2048], edge_index=[2, 110], x_reid=[21, 2048], y_id=[21, 1], fwrd_vel=[21, 4], bwrd_vel=[21, 4], x_frame_start=[21, 1], x_frame_end=[21, 1], x_center_start=[21, 4], x_center_end=[21, 4], x_box_start=[21, 4], x_box_end=[21, 4], x_feet_start=[21, 2], x_feet_end=[21, 2], pruning_score=[110], x_fwrd_motion=[21, 32, 4], x_bwrd_motion=[21, 32, 4], x_ignore_traj=[21], batch=[21], ptr=[2])

batch GraphBatch(x=[11, 2048], edge_index=[2, 0], x_reid=[11, 2048], y_id=[11, 1], fwrd_vel=[11, 4], bwrd_vel=[11, 4], x_frame_start=[11, 1], x_frame_end=[11, 1], x_center_start=[11, 4], x_center_end=[11, 4], x_box_start=[11, 4], x_box_end=[11, 4], x_feet_start=[11, 2], x_feet_end=[11, 2], pruning_score=[0], x_fwrd_motion=[11, 64, 4], x_bwrd_motion=[11, 64, 4], x_ignore_traj=[11], batch=[11], ptr=[2]) `

ajtao commented 4 months ago

I believe i've figured this out.

In mot17.py, i was filtering the dataframe and i end up with a subset of the original dataframe. I do this because since i'm processing a sports match, in mot17, i isolate the full match's detection file to a single rally. However doing this filtering meant that my data frame did not start at index=1 as you would normally see. The downstream code appears to not like this for some reason.

So the fix for me was to insert this code: det_df = det_df.reset_index(drop=True). This re-numbers the index to start at 1.

Now I'm no longer getting the crash.

ajtao commented 4 months ago

Unfortunately, the renumbering of the index didn't fix the issue across the board. I continue to get failures with some short tracks, usually ones ~60-70 frames long.

I'm attaching a full repro data sample for a failing case in case you'd be able to run this @ocetintas

play.zip

ocetintas commented 3 months ago

        data_list = [hicl_graph.construct_curr_graph_nodes(self.config) for hicl_graph in hicl_graphs if torch.unique(hicl_graph.map_from_init).shape[0] > 1]
        if data_list:
            batch = Batch.from_data_list(data_list)
            curr_depth = hicl_graphs[0].curr_depth
            if self.config.do_motion and curr_depth >0:
                motion_pred = self.predict_motion(batch, curr_depth = curr_depth)
                batch.pruning_score = compute_giou_fwrd_bwrd_motion_sim(batch, motion_pred)

                if 'estimate_vel' in motion_pred[0]:
                    batch.fwrd_vel, batch.bwrd_vel = motion_pred[0]['estimate_vel'], motion_pred[1]['estimate_vel']

            else:
                motion_pred = None

            # Now unbatch graphs, add their remaining features, and batch them again
            curr_graphs = Batch.to_data_list(batch)

            data_list = [hicl_graph.add_edges_to_curr_graph(self.config, curr_graph) for curr_graph, hicl_graph in zip(curr_graphs, hicl_graphs) if ((curr_graph.edge_index is not None) and (curr_graph.edge_index.numel()))]

            if data_list:
                curr_graph_batch = Batch.from_data_list(data_list)
            else:
                curr_graph_batch = None
        else:
            curr_graph_batch = None
            motion_pred = None

        return curr_graph_batch, motion_pred

ocetintas commented 3 months ago

Hi, I am assuming you have a very specific edge case in your graphs where none of your graphs in a batch has a tracking candidate. Please replace the hicl_to_curr function in https://github.com/dvl-tum/SUSHI/blob/main/src/tracker/hicl_tracker.py#L171 with the script given above and this should solve the problem

ocetintas commented 3 months ago

Oh and you also need to add an if condition checking if your curr_graph_batch is None under hicl_forward. An example of how to do this is as follows. (below code might not work for you, just adding it as a reference to where to add the if condition: if (curr_batch is not None) and curr_batch.edge_index.numel(): )


    def hicl_forward(self, hicl_graphs, logs, oracle, mode, max_depth, project_max_depth, return_edge_specs=False):
        hicl_feats=None
        edge_specs = [{} for i in range(self.config.hicl_depth)]  # {"Preds":, "GT": } for each level
        loss = torch.as_tensor([.0], device=self.gpu_id)  # Initialize the batch loss

        # For each depth
        for curr_depth in range(max_depth):

            t0 = time.time()
            # Put the graph into the correct format
            curr_batch, _ = self._hicl_to_curr(hicl_graphs=hicl_graphs)  # Create curr_graphs from hierarachical graphs    

            if (curr_batch is not None) and curr_batch.edge_index.numel(): 

                batch_idx = curr_batch.batch

                if curr_depth == 0 or not self.config.do_hicl_feats:
                    curr_batch.hicl_feats = None

                elif hicl_feats is not None:
                    curr_batch.hicl_feats = hicl_feats

                t1 = time.time()
                # Forward pass if there is an edge
                if oracle:
                    # Oracle results
                    curr_batch.edge_preds = curr_batch.edge_labels
                    logs[curr_depth]["Loss"].append(0.)
                    # Calculate batch classification metrics and loss
                    logs[curr_depth] = self._calculate_true_false_metrics(edge_preds=curr_batch.edge_preds,
                                                                edge_labels=curr_batch.edge_labels, 
                                                                logs=logs[curr_depth])
                else:
                    # Graph based forward pass
                    outputs = self.model(curr_batch, curr_depth)  # Forward pass for this specific depth

                    # Hacky way to solve GPU usage problem
                    if self.config.dummy_gpu_usage:
                        self.dummy_model(self.dummy_tensor)

                    # Produce decisions
                    curr_batch.edge_preds = torch.sigmoid(outputs['classified_edges'][-1].view(-1).detach())
                    # curr_batch.edge_preds = curr_batch.edge_labels

                    if mode == 'val' or self.config.force_logs:
                        # Calculate the batch loss
                        logs[curr_depth]["Loss"].append(self._calculate_loss(outputs=outputs, edge_labels=curr_batch.edge_labels, edge_mask=curr_batch.edge_mask).item())
                        # Calculate batch classification metrics and loss
                        logs[curr_depth] = self._calculate_true_false_metrics(edge_preds=curr_batch.edge_preds,
                                                                    edge_labels=curr_batch.edge_labels, logs=logs[curr_depth], decision_threshold=self.config.decision_threshold[curr_depth])
                    elif mode == 'train':
                        # Calculate loss and prepare for a forward pass
                        loss_curr_depth = self._calculate_loss(outputs=outputs, edge_labels=curr_batch.edge_labels, edge_mask=curr_batch.edge_mask)
                        loss_curr_depth.backward()                      
                        loss += loss_curr_depth

                        logs["Loss_per_Depth"][curr_depth].append(loss_curr_depth.detach().item())  # log the curr loss

                    if return_edge_specs:
                        edge_specs[curr_depth]["preds"] = curr_batch.edge_preds
                        edge_specs[curr_depth]["gt"] = curr_batch.edge_labels
                        edge_specs[curr_depth]["edge_feats"] = curr_batch.edge_attr
                t2 = time.time()
                graph_data_list = curr_batch.to_data_list()
                if mode != 'train':
                    assert len(graph_data_list) == 1, "Track batch size is greater than 1"

                hicl_feats = []
                if curr_depth < project_max_depth:  # Last layer update is not necessary for training
                    for ix_graph, graph in enumerate(graph_data_list):        
                        if graph.edge_index.numel():

                            # Process the graph before feeding it to the projector
                            self._postprocess_graph(graph=graph, decision_threshold=self.config.decision_threshold[curr_depth])
                            # Project model output with a solver
                            graph = self._project_graph(graph)

                            # Assign ped ids
                            n_components, labels = self._assign_labels(graph)

                            node_mask = batch_idx == ix_graph
                            if self.config.do_hicl_feats and not oracle:
                                hicl_feats.append(self.model.layers[curr_depth].hicl_feats_encoder.pool_node_feats(outputs['node_feats'][node_mask], labels))

                            # Update the hierarchical graphs with new map_from_init and depth
                            hicl_graphs[ix_graph].update_maps_and_depth(labels)

                        else:
                            # Update the hierarchical graphs
                            hicl_graphs[ix_graph].update_maps_and_depth_wo_labels()
                t3 = time.time()
                if self.config.print_runtime_stats:
                    print("Setup graphs: ", t1-t0)
                    print("Forward pass: ", t2-t1)
                    print("Projection: ", t3-t2)

                if len(hicl_feats) > 0:
                    hicl_feats = torch.cat(hicl_feats)

                else:
                    hicl_feats =None

    return hicl_graphs, loss, logs, edge_specs

ajtao commented 3 months ago

Hi @ocetintas I'm getting an error with the first code snippet for _hicl_to_curr() because map_from_init is not an attribute of hicl_graph.

ocetintas commented 3 months ago

You can try replacing: https://github.com/dvl-tum/SUSHI/blob/main/src/data/graph.py#L94 with

        if hasattr(self, 'x_node'):
            self.map_from_init = torch.arange(self.x_node.shape[0]).long().to(self.device())  # From initial nodes to the latest layer

ajtao commented 3 months ago

I think that did it, thanks very much for all the help @ocetintas!

dvl-tum / SUSHI

list index out of range #24