franktpmvu / NeighborTrack

[CVPR 2023 workshop] NeighborTrack: Single Object Tracking by Bipartite Matching With Neighbor Tracklets and Its Applications to Sports
45 stars 1 forks source link

Can the model can be used with the original DIMP? #1

Closed Aeim closed 1 year ago

Aeim commented 1 year ago

Can I implement the original DIMP with NeighborTrack?

franktpmvu commented 1 year ago

I implement it on ocean is ok(kind of DIMP), the original DIMP is like https://github.com/franktpmvu/NeighborTrack/blob/c889695427a2288b42e31cd0f9e0f7e509244729/trackers/ostrack/pytracking/tracker/dimp/dimp.py#L120 to get pos + score(L116)

So, you need use it (pos + scores) to get neighbors, and then, after NeighborTrack, use new pos + scores to update DIMP (Line 122 to Line 147)

The above is the practice of wrapping dimp in the outermost layer (step = backbone to NeighborTrack to DIMP)

To wrap neighbortrack in the outermost layer, only take the result of DIMP to update NeighborTrack (step = backbone to DIMP to NeighborTrack, I use this setting on Ocean.)

Aeim commented 1 year ago

Thank you!

Aeim commented 1 year ago

Can I see your implementation of ocean model?

franktpmvu commented 1 year ago

Ocean implement is a old version of NeighborTrack code, there are some different in our new version. I upload the old function of it, you can see original ocean is like https://github.com/franktpmvu/NeighborTrack/blob/89aa0781c5b59ac570e3c1c47cca5b1dd6a5f945/trackers/example_ocean/test_ocean.py#L166

And then, w/ NeighborTrack are in https://github.com/franktpmvu/NeighborTrack/blob/89aa0781c5b59ac570e3c1c47cca5b1dd6a5f945/trackers/example_ocean/test_ocean.py#L280

online_tracker.neighbor_track in https://github.com/franktpmvu/NeighborTrack/blob/89aa0781c5b59ac570e3c1c47cca5b1dd6a5f945/trackers/example_ocean/online.py#L202

siam_tracker._neighbor_track in https://github.com/franktpmvu/NeighborTrack/blob/89aa0781c5b59ac570e3c1c47cca5b1dd6a5f945/trackers/example_ocean/ocean.py#L1073

This two functions are implement of online/offline Ocean version.

Aeim commented 1 year ago

Thank you

Aeim commented 1 year ago

So, to implement with your new version of NeighborTrack I need to implement 3 function

franktpmvu commented 1 year ago

Yes, init and track_neighbor are very simple. DIMP is a post processing method, if backbone=A, DIMP=B, NeighborTrack=C, you can try ABC or ACB, my NeighborTrack_Ocean is style of ABC. i use both AB on forward and reverse track. Be careful update_center on DIMP, when use ABC style, first must use NeighborTrack to change answer, then update DIMP model weight.

Aeim commented 1 year ago

Now, I got stuck! So I try to follow your guildline

to get pos + score(L116) So, you need use it (pos + scores) to get neighbors, and then, after NeighborTrack, use new pos + scores to update DIMP (Line 122 to Line 147)

I want to make a process ACB (Backbone -> Neighbor -> DIMP How can I get neightbors by pos + score (L116) because new pos is the original image size but the score map of DIMP is just 23x23? do you have any guildline for me?

franktpmvu commented 1 year ago

as you can see, L119 self.localize_target input 23x23 pos and output 3 value:

L223 to L235 try to get maximum scores pos by L120 (sample_pos is like grid, translation_vec is its localize regression). We need all of 23x23 scores , so you will write a new version of localize_target() to get all translation_vec (23x23) and its final scores.

first write new function like localize_target_neighbor(): 1.copy all of localize_target 2.kill L226 to L229, we didn't need to get max index. 3.get all of 23x23 disp_index. 4.get all of 23x23 disp_index-score_center like L229 5.get all of 23x23 translation_vec and finally get the 23*23 new pos of original image size

Now you have 23*23 neighbors pos can go to next step.

Aeim commented 1 year ago

3.get all of 23x23 disp_index. So, this mean I need to get all of disp_index of neighbor by max_score * neighbor threshold, is it? If not that mean I need to get all of index in scoremap? Thank you for your kindness. you help me a lot.

franktpmvu commented 1 year ago

yes, all of 23x23 point, you just need some of them, like max_score * neighbor_threshold to get neighbor's pos.

Aeim commented 1 year ago

Now I already implement new function, but maybe some missing or something wrong in my function. Can you help me check my code? and I still don't get about scale_ind variable.

def localize_target_neighbor(self, scores, sample_pos, sample_scales):
        """Run the target localization."""

        scores = scores.squeeze(1)

        preprocess_method = self.params.get('score_preprocess', 'none')
        if preprocess_method == 'none':
            pass
        elif preprocess_method == 'exp':
            scores = scores.exp()
        elif preprocess_method == 'softmax':
            reg_val = getattr(self.net.classifier.filter_optimizer, 'softmax_reg', None)
            scores_view = scores.view(scores.shape[0], -1)
            scores_softmax = activation.softmax_reg(scores_view, dim=-1, reg=reg_val)
            scores = scores_softmax.view(scores.shape)
        else:
            raise Exception('Unknown score_preprocess in params.')

        score_filter_ksz = self.params.get('score_filter_ksz', 1)
        if score_filter_ksz > 1:
            assert score_filter_ksz % 2 == 1
            kernel = scores.new_ones(1,1,score_filter_ksz,score_filter_ksz)
            scores = F.conv2d(scores.view(-1,1,*scores.shape[-2:]), kernel, padding=score_filter_ksz//2).view(scores.shape)

        # if self.params.get('advanced_localization', False):
        #     return self.localize_advanced_neighbor(scores, sample_pos, sample_scales)

        # Get neighbors
        score_sz = torch.Tensor(list(scores.shape[-2:])) # (23,23)
        score_center = (score_sz - 1)/2 # (11,11)
        max_score, max_disp = dcf.max2d(scores)
        _, scale_ind = torch.max(max_score, dim=0) # <-Don't know how to deal with

        c_score_map = scores
        mask = c_score_map.ge(0.7 * max_score)  # neighbor threshold
        values = torch.masked_select(c_score_map, mask)
        indexes = torch.nonzero(mask.squeeze())
        n_score = values

        # Compute translation vector and scale change factor
        output_sz = score_sz - (self.kernel_size + 1) % 2
        translation_vec_neighbors = torch.tensor([]).cpu()
        for i, index in enumerate(indexes):
            index = index.clone().unsqueeze(0)
            index = index.float().cpu().view(-1)
            target_disp = index - score_center
            translation_vec = target_disp.squeeze(0) * (self.img_support_sz / output_sz) * sample_scales
            translation_vec_neighbors = torch.cat((translation_vec_neighbors, translation_vec), 0)

        return translation_vec_neighbors, scale_ind, scores, None

And this is my track_neighbor I edit around L119

        # Compute classification scores
        scores_raw = self.classify_target(test_x)

        # Localize the target
        translation_vec, scale_ind, s, flag = self.localize_target(scores_raw, sample_pos, sample_scales)
        new_pos = sample_pos[scale_ind,:] + translation_vec
        print(new_pos)
        # NeighborTrack
        translation_vec_neighbors, scale_ind_neighbor, s_negihbor, flag = self.localize_target_neighbor(scores_raw, sample_pos, sample_scales)
        for translation_vec_neighbor in translation_vec_neighbors: 
            new_pos_neighbor = sample_pos[scale_ind_neighbor:,:] + translation_vec_neighbor
            print(new_pos_neighbor)

Please help.

franktpmvu commented 1 year ago

can you print max_score and max_disp? i didnt check it type and value, but i think it's kind of : max_score maybe like 23x23 value max_disp = 23x23x2 ( x index, y index)

if cannot understand max_score, maxdisp = dcf.max2d(scores) , scale_ind = torch.max(max_score, dim=0) # <-Don't know how to deal with maybe you can print it and try to copy its pattern.

i see this code maybe want to find a point of grid have max score and use it to define new bbox position and size

code of dcf.max2d are in https://github.com/franktpmvu/NeighborTrack/blob/c889695427a2288b42e31cd0f9e0f7e509244729/trackers/ostrack/pytracking/libs/dcf.py#L156

Aeim commented 1 year ago

So I print out the result of max_score and max_disp, the result is show like this.

max_score: tensor([0.7967], device='cuda:0') torch.Size([1])
max_disp: tensor([[13, 14]], device='cuda:0') torch.Size([1, 2])

max_disp is kind of x,y position in 23x23 but I'm not sure about how to use scale_ind

franktpmvu commented 1 year ago

oh, scale_ind is try to find a bigger one of max score, if max score is global maxima value, scale_ind are ignore, if we have a lot of local minima score (for example value name is scale_index, maybe some model like yolov3 have more than one scale output layer, e.g. 3 scale layer), them scale_ind will choose global maxima of them.

Aeim commented 1 year ago

So I just ignore it right? If Yes, I alread got n_scores and its new pos of neighbor. so what to do next? From your new versioon implementation it seem like the return state of track_neighbor require xyhw, neighbor_xyhw, score, n_score, but the new_pos of DIMP is just x, y position not the bbox of it. Now, I can't unpack to your requirement.

Sorry for bother you, but I'm newbie. Thank you for your help

franktpmvu commented 1 year ago

try use https://github.com/franktpmvu/NeighborTrack/blob/c889695427a2288b42e31cd0f9e0f7e509244729/trackers/ostrack/pytracking/tracker/dimp/dimp.py#L144 to get bbox, you have all needed value now. i didnt know update codes change new_pos to other domain or not. so please check when input self.pos eqal to input new_pos, if not, it will be more difficult.

Aeim commented 1 year ago

I check self.pos and new_pos is not equal. How to deal with the problem? This is my code In neighbor_track()

       # Localize the target
        translation_vec, scale_ind, s, flag = self.localize_target(scores_raw, sample_pos, sample_scales)
        new_pos = sample_pos[scale_ind,:] + translation_vec
        xywh = self.get_iounet_box(new_pos, self.target_sz, sample_pos[scale_ind,:], sample_scales[scale_ind])
        print(new_pos)
        print(self.pos)
        # NeighborTrack
        translation_vec_neighbors, n_score, s_negihbor, flag = self.localize_target_neighbor(scores_raw, sample_pos, sample_scales)
        new_pos_neighbors = []
        for translation_vec_neighbor in translation_vec_neighbors: 
            new_pos_neighbor = sample_pos[scale_ind:,:] + translation_vec_neighbor
            new_pos_neighbor = new_pos_neighbor.squeeze(0)
            xywh_n = self.get_iounet_box(new_pos_neighbor, self.target_sz, sample_pos[scale_ind,:], sample_scales[scale_ind])
            print(xywh_n)
            new_pos_neighbors.append(xywh_n)
        return xywh, 1, new_pos_neighbors, n_score

in def localize_target_neighbor(self, scores, sample_pos, sample_scales):

        """Run the target localization."""

        scores = scores.squeeze(1)

        preprocess_method = self.params.get('score_preprocess', 'none')
        if preprocess_method == 'none':
            pass
        elif preprocess_method == 'exp':
            scores = scores.exp()
        elif preprocess_method == 'softmax':
            reg_val = getattr(self.net.classifier.filter_optimizer, 'softmax_reg', None)
            scores_view = scores.view(scores.shape[0], -1)
            scores_softmax = activation.softmax_reg(scores_view, dim=-1, reg=reg_val)
            scores = scores_softmax.view(scores.shape)
        else:
            raise Exception('Unknown score_preprocess in params.')

        score_filter_ksz = self.params.get('score_filter_ksz', 1)
        if score_filter_ksz > 1:
            assert score_filter_ksz % 2 == 1
            kernel = scores.new_ones(1,1,score_filter_ksz,score_filter_ksz)
            scores = F.conv2d(scores.view(-1,1,*scores.shape[-2:]), kernel, padding=score_filter_ksz//2).view(scores.shape)

        # if self.params.get('advanced_localization', False):
        #     return self.localize_advanced_neighbor(scores, sample_pos, sample_scales)

        # Get neighbors
        score_sz = torch.Tensor(list(scores.shape[-2:])) # (23,23)
        score_center = (score_sz - 1)/2 # (11,11)
        max_score, max_disp = dcf.max2d(scores)
        _, scale_ind = torch.max(max_score, dim=0) # <-Don't know how to deal with

        c_score_map = scores
        mask = c_score_map.ge(0.7 * max_score)  # neighbor threshold
        values = torch.masked_select(c_score_map, mask)
        indexes = torch.nonzero(mask.squeeze())
        n_score = values

        # Compute translation vector and scale change factor
        output_sz = score_sz - (self.kernel_size + 1) % 2
        output = []
        for i, index in enumerate(indexes):
            index = index.clone().unsqueeze(0)
            index = index.float().cpu().view(-1)
            target_disp = index - score_center
            translation_vec = target_disp.squeeze(0) * (self.img_support_sz / output_sz) * sample_scales
            output.append(translation_vec)
        translation_vec_neighbors = torch.stack(output, 0)
        return translation_vec_neighbors, n_score, c_score_map, None
franktpmvu commented 1 year ago

see https://github.com/franktpmvu/NeighborTrack/blob/c889695427a2288b42e31cd0f9e0f7e509244729/trackers/ostrack/pytracking/tracker/dimp/dimp.py#L495

we know new_pos finally using by function update_state(): so we find it, check how to translate new_pos to self.pos we see this function update something and output self.pos, so you need follow this function step by step but don't update any self.xxx (because you not sure what answer will be choose by NeighborTrack) you can use a variable to simulate L489 L490 L495 but don't really update it

Aeim commented 1 year ago

Ok, I will try to follow the step but just another one question Where I need to put the return state for your neighbor_track function

https://github.com/franktpmvu/NeighborTrack/blob/c889695427a2288b42e31cd0f9e0f7e509244729/trackers/ostrack/pytracking/tracker/dimp/dimp.py#L133

It before update comment right? or after this line https://github.com/franktpmvu/NeighborTrack/blob/c889695427a2288b42e31cd0f9e0f7e509244729/trackers/ostrack/pytracking/tracker/dimp/dimp.py#L167

franktpmvu commented 1 year ago

because you use type ACB , so NeighborTrack just useing to get new_pos, you need to reverse the NeighborTrack answer to type of new_pos and replace new_pos (L122)

Aeim commented 1 year ago

Thank but If I want to switch to ABC, I get use that 167L right?

franktpmvu commented 1 year ago

yes, its kind of you are not last guy of office so youdidnt need to close the light. update step will be finish by original code.

Aeim commented 1 year ago

Thank you for your help. Now I can run with neighbortrack success in DIMP. I got a one question from your ocean code. where the code that you get the neighbor position?

franktpmvu commented 1 year ago

https://github.com/franktpmvu/NeighborTrack/blob/89aa0781c5b59ac570e3c1c47cca5b1dd6a5f945/trackers/example_ocean/ocean.py#L445

https://github.com/franktpmvu/NeighborTrack/blob/89aa0781c5b59ac570e3c1c47cca5b1dd6a5f945/trackers/example_ocean/ocean.py#L256

you can see pscore>threshold*np.max(pscore), and position, size get by L259 to L274

each project have different way to get position and size, we just follow it.