Extract distinct features for same class objects from generalized classification model to solve Re-Id with SSD

ckhire / video-analytics-labs

The intention to create this repo is to have all the experiments in one place without bothering the real-time usage

0 stars 0 forks source link

Introduction: This is one more attempt to solve the Re-identification problem along with object detection in the same network without re-training the back-bone architecture. Here the objective is to get better features for detected objects of the same class which would act as unique signature with respect to each distinct instance. Here the attempt is to extract unique features for each instance of object from generalization network (Res-net, VGG-Net, BottelneckSP) etc. This is multi-task learning problem which I would try to solve. Here the objective is not to re-trained the backbone but to keep it as it is. To extract the features I would be formulating new reverse-front-mapping technique and would be extracting the features from same layer of the network as side input/storage. Here I would try to solve the 2 hypothesis

a. that the features are sufficient to distinguish between objects of same type b. features of same instance of object with two different time are not much different. That is features of same instance of object with distinct time are very similar and well distinguishable from the features of different instance of same type.

Solving the above hypothesis would actually solve the problem of having good features for solving Re-Id.

Major Unknowns:

What kind of data would require to conclude such hypothesis?
What is time to create such data ?
How much amount of data is required to formulate strong conclusions ?
What techniques would be used for feature distinction and feature similarity?
What technique to use to establish the idea that feature distinction and similarity is solve using same weights and the similarity output is well within threshold and that of distinct is above it ? Are traditional techniques like Cosine similarity, Hungarian distance etc are useful or we need to use some extended layers as new head and some loss functions ?
Is it possible for all state-of-the-art classification network ?
How to formulate such solution?
What framework to use for resolving such problem ?

Approach:

I will be using pytorch implementation of yolov5 along with given backbone
First I will formalize the visual perception base formuale from bounding boxes to input images.
Then through various starting layers I will go on tracing all co-ordinates for all input bounding boxes till I reach certain 4-5 layers where-in I will stop randomly.
Mostly I will try to have feature representation of every bbox as 64 dimensional vector first
I will be using simple image with 2-3 person
I will take the features from initial 4th or 5th layer and would try to use simple cosine similarity to know the differences among the features
I will also use one more image having same two person at different place.
Again by using same method from same layer I will take the features.
This time I will have to cross match the features from current iteration with that from previous one.
In cross match I should have less difference for the bbox of same instance of person and it should be more than enough for different instance of same object.
After having successful implementation of above hypothesis I would draw better iterative solution
Definitely cross checking the hypothesis would required more number of images. and then I can have image pairs with varied number of patterns/possible cases

The above entire approach do-not require re-training or transfer learning. It only requires to have real formulation and then functional logic to extract such features from particular layers and then final hypothesis function.

Timeline: 40 hrs with documentation of entire experiment

ckhire / video-analytics-labs

Extract distinct features for same class objects from generalized classification model to solve Re-Id with SSD #3