ZQPei / deep_sort_pytorch

MOT using deepsort and yolov3 with pytorch
MIT License
2.74k stars 718 forks source link

How can I train deep sort using my own images? #7

Closed sandeshshrestha45 closed 5 years ago

sandeshshrestha45 commented 5 years ago

Hello ZqPei,

Is it possible to finetune the Marketdataset? Or how can I use my own data for training DeepSort from scratch to generate checkpoints?

ZQPei commented 5 years ago

Yes, it is possible. I have released the training scripts. You shall make your own dataset first like Market with each image resized to 64x128. Then, put images of one ID into individual folders. Good luck!

sandeshshrestha45 commented 5 years ago

Hello ZQPei, While training using market dataset, I am getting 0 loss and 100% accuracy in each step since the beginning. I have shared a screenshot below. I used train.py inside deep folder in your repository as you mentioned. Could you please tell me where the problem is? capture

Also, I'm not getting any curve for training. Only validation curve is being saved. train

ZQPei commented 5 years ago

I not sure exactly where the problem is. You need to tell me more details. 1.What is your dataset folder like? 2.Which python do you use? python2 python3? 3.You can debug it by adding a breakpoint in train() function. Check the inputs is correct or not.

sandeshshrestha45 commented 5 years ago
  1. My dataset is in the following order: -data -train -1 -.jpg -.jpg ........... -.jpg -test -1 -.jpg -.jpg ........... -.jpg In the data folder there are test and train folders. Inside test and train folder, there are subfolders called 1. Inside the folder 1, there are .jpg images of size 64x128.

  2. I am using python 3.7.

The training completes successfully and the checkpoint is also saved but the loss is 0 and accuracy 100% on each iteration. When i run test.py, the following error occurs: capture

ZQPei commented 5 years ago

The reason why your loss is zero, is your dataset folders config is not right. And this also lead to your test.py error. You shall re-organize your dataset like this as is shown below. In train/ image In test/ image

sandeshshrestha45 commented 5 years ago

Oh. How many images shall i put in each folder? Do i have to make one folder to put one image or i can put multiple images in one folder.

for e.g. insider the folder 0002, i have to put only one image or i can put more than one image?

ZQPei commented 5 years ago

It has no limit. You can put as many as images as you want. But you'd better balance your image numbers in each folder.

sandeshshrestha45 commented 5 years ago

thank you so much. i will try that.

i have another question to you. is your deepsort pytorch model exactly same as the original deep sort? or did u make any changes? what are the changes you made in your pytorch implementation of deepsort?

ZQPei commented 5 years ago

Not exactly the same. Because It is not that easy to convert a tf model parameters to a pytorch model. I have changed the original CNN model a little bit, mainly adding more channels in conv layers. You can also customize these hyper parameters on your models. The other difference is the dataset I use for training is Martet1501, while the auther was using Mars dataset. Mars dataset is much bigger than Market1501. I suggest you try that later.

If you feel this repository is helpful, please give this repo a star, thank you!

sandeshshrestha45 commented 5 years ago

sure, thanks a lot for your help.

sandeshshrestha45 commented 5 years ago

Hello ZQPei, thanks for your co-operation and help. I followed your instructions to configure the dataset and successfully trained the model. But when i try to run it using test.py or demo_yolo3_deepsort.py..the following erros occurs: capture

Also, the loss graph is quite strange with increasing validation loss. train

what could be the issue?

ZQPei commented 5 years ago

Loss on training set is really decreasing, which means the network is converging. But, loss on val set is going the opposite way. This can happens this model is overfitting on your trainset. A model will run into overfitting for many reasons. From your terminel, i can tell there are only 29 classes in your dataset. So I assume that you dataset is really small, isnt it? Too small dataset will lead to overfitting. I suggest you to finetune your model on my pretrained parameter to solve this problem. Also, you can google how to solve overfitting problem on a CNN model.

When you run demo.py, the error occured because your model has only 29 outputs in the final fully connected layer. You can solve that by modifing feature_extractor.py in line 10.

self.net = Net(num_classes=29, reid=True)
sandeshshrestha45 commented 5 years ago

Can u please tell me how can i fine tune using your pretrained parameter?

ZQPei commented 5 years ago

I recommend you the official transfer learninng tutorial on ants and bees. It is not that hard, you will get how to fineturn a model with pretrained parameters. This is good for you.

sandeshshrestha45 commented 5 years ago

The tutorial you suggested focuses on Resnet model. I searched other tutorials but they perform fine tuning on other popular models like VGG, Inception, Xception. I couldnt get any idea how to fine tune other models including your model. Could you please suggest something else?

Also, Could you please tell me the loss function that you used for training?

ZQPei commented 5 years ago

Sorry for repling so late. Yes, the tutorial is basicly showing you how to transfer a model based on a pretrained model on IMAGENET to your own classification task. The deep sort CNN model is to get the similarity of two person. How? It is because CNN model's output without the last fully connected layer can get you the feature of a person image, then you can compute the cos similarity(or something else) of two person features to get whether these two are or aren't the same person. It is the main idea of a simple reid model. So how do we train a CNN model to get a nice person feature that can show the differences of different individuals? By training a CNN model that can classify different person, e.g. in Market1501, there are 1500 individual class of different person. This is basicly a image classification task. I hope you can get that.

I use cross entropy loss for training. It is a classic loss function in classification task. Also, you can try focal loss or something else from the newest paper.

sandeshshrestha45 commented 5 years ago

Thank you very much ZQPei for your explanation and help. But I am a little confused about training market1501. Do I have to make separate 1501 folders inside the training folder and put images of each individual inside each folder and same for test? If I do that will there be enough data to train properly?

I mean for 1501 classes, the structure should be:

And put the images of each person into each folder? Or am I getting it wrong? Because if I do that, there will be only around 30-40 images per class and would the training data be enough?

ZQPei commented 5 years ago

Yes, you are right. You can use some tricks of image augmentation (like random horizontal flip, change their light contrast, resize then random crop and etc.) to prevent your model from overfitting. OR, you can take a look at MARS dataset which is the superset of Market1501.

sandeshshrestha45 commented 5 years ago

And may I ask what is the 'top1err' graph in the train chart about? What does it signify?

ZQPei commented 5 years ago

After your CNN's last fully connected layer, you will do softmax to get the proballity of each class and choose the class with the top1 prob(max prob) as your predict class. With this prediction, your error is named top1error.

sandeshshrestha45 commented 5 years ago

Thanks again for your great help.

sandeshshrestha45 commented 5 years ago

Hello ZQPei, What changes do I have to make in your code if I want to display all the classes with their class name? I have 5 classes. E.g. human, tv on, tv off, fan on and fan off. I need to display their names along with their respective object id. In your code, only persons are displayed by bounding box and it is labelled as 'object 1', 'object 2' and so on. If need to display as ' human 1', 'human 2', 'fan on 1', 'fan on 2' etc..... How can I do that?

mljack commented 3 years ago

train

To trains a image classifier, the dataset should use the same set of img label in both training and test set. The market1501 dataset doesn't organized in that way, since it's designed for re-ID problems.

Just reorganize the dataset as ZQPei mentioned~

In train/ image In test/ image

Sarouch commented 3 years ago

train

To trains a image classifier, the dataset should use the same set of img label in both training and test set. The market1501 dataset doesn't organized in that way, since it's designed for re-ID problems.

Just reorganize the dataset as ZQPei mentioned~

In train/ image In test/ image

@mljack Hello, do you mean the same number of datasets in both train and test ? I know that the test dataset is only 20% of all datasets ! So, I don't understand how to dataset should use the same set of img label in both training and test set. Thank you in advance,

mljack commented 3 years ago

In short, a classifier is first trained to output object id from cropped object image, then the last fully-connected layer is removed and the neural network is re-targeted to output object appearance features which is used to compute cosine distances. To train the classifier, train and test sets should share the same set of object id which are labels.

Sarouch commented 3 years ago

In short, a classifier is first trained to output object id from cropped object image, then the last fully-connected layer is removed and the neural network is re-targeted to output object appearance features which is used to compute cosine distances. To train the classifier, train and test sets should share the same set of object id which are labels.

@mljack thank you for your response, I don't really understand, do you lean that in test and train folders, I put labels or images ? I mean in 0010, 0011.. I put labels ? And the same images/labels are in both folders ( train and test) a copy ? During my last succefull trainig of deepsort, I copy subfolders from train to test and it works but I want to better understand.

I have this curves when I made the same images (without labels) in test and train folders. train So it is clearly false what I am doing ! PLease help. Thank you

mljack commented 3 years ago

The folder name is the label of images in that folder. Duplicating images could introduces risks of overfitting. Split images of each folder to train/test set.

Sarouch commented 3 years ago

The folder name is the label of images in that folder. Duplicating images could introduces risks of overfitting. Split images of each folder to train/test set.

@mljack Thank you very much, to train deepSort I don't need xml/.txt labels, that's right ? Only the pretrained weight is needed ? Regards,

hedi1920 commented 3 years ago

Did you succeed to train deep sort? If yes, What is the form of dataset architecture?

Sarouch commented 3 years ago

Hi @hedi1920, I used this format in both test and train filder, putting inside inly one picture (different in each folder (test/train)). But I keep the same number as recommanded above. image

hedi1920 commented 3 years ago

Thank you Just i would now if the images are cropped (just contains the object) or no?

Sarouch commented 3 years ago

Thank you Just i would now if the images are cropped (just contains the object) or no?

@hedi1920 I put only the image in the folder, without any information ( my image contains 5 classes) without bbox, and it was my last question about cropped data, so I don't know if it is right or no, may be @mljack could answer to this question. thanks

hedi1920 commented 3 years ago

Tje folder data contain 80% of train and 20% of test Test and train contains both inputs and labels

Sarouch commented 3 years ago

Tje folder data contain 80% of train and 20% of test Test and train contains both inputs and labels

@hedi1920 May be I didn't understand the answers above, but they say the same number in train as in test, that weird because your answer right know is right, did you try it ?

hedi1920 commented 3 years ago

If you use the same set for train and test the accuracy will be equal 100% from the first epoch!!!!! It's weird So the sets will be different to avoid the overfitting

Sarouch commented 3 years ago

If you use the same set for train and test the accuracy will be equal 100% from the first epoch!!!!! It's weird So the sets will be different to avoid the overfitting @hedi1920 I agree, but if you read my previous messages you will see that I don't really understand how it can work without cropped images and with the same amount of test/train dataset, so if you try It with 20% test and it works could you please leave a message, thanks

Sarouch commented 3 years ago

@hedi1920 I agree, but if you read my previous messages you will see that I don't really understand how it can work without cropped images and with the same amount of test/train dataset, so if you try It with 20% test and it works could you please leave a message, thanks

hedi1920 commented 3 years ago

So, did you find good results?

Le mar. 22 juin 2021 à 09:22, Sarouch @.***> a écrit :

@hedi1920 https://github.com/hedi1920 I agree, but if you read my previous messages you will see that I don't really understand how it can work without cropped images and with the same amount of test/train dataset, so if you try It with 20% test and it works could you please leave a message, thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ZQPei/deep_sort_pytorch/issues/7#issuecomment-865706260, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMLLQP4AXJLOJ66REUFISXDTUBB3RANCNFSM4GZIBL4A .

Sarouch commented 3 years ago

@hedi1920 My last result implementing the same amount of data in both test and train I obtained this result image

hedi1920 commented 3 years ago

In train.py

Net =net(num_classes=num_classes, reid=True)

Le mar. 22 juin 2021 à 13:22, Sarouch @.***> a écrit :

@hedi1920 https://github.com/hedi1920 My last result implementing the same amount of data in both test and train I obtained this result [image: image] https://user-images.githubusercontent.com/40342672/122923465-0ab93680-d365-11eb-8672-bbe4a682d18f.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ZQPei/deep_sort_pytorch/issues/7#issuecomment-865936615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMLLQP35TA4HA26PLT34KVLTUB56XANCNFSM4GZIBL4A .

Sarouch commented 2 years ago

In train.py Net =net(num_classes=num_classes, reid=True) Le mar. 22 juin 2021 à 13:22, Sarouch @.***> a écrit : @hedi1920 https://github.com/hedi1920 My last result implementing the same amount of data in both test and train I obtained this result [image: image] https://user-images.githubusercontent.com/40342672/122923465-0ab93680-d365-11eb-8672-bbe4a682d18f.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMLLQP35TA4HA26PLT34KVLTUB56XANCNFSM4GZIBL4A .

@hedi1920 Hello, did you find out the problem and train the deepsort ?

hedi1920 commented 2 years ago

Hi , How do your prepare the labels!!

Le mar. 16 nov. 2021 à 11:19, Sarouch @.***> a écrit :

In train.py Net =net(num_classes=numclasses, reid=True) Le mar. 22 juin 2021 à 13:22, Sarouch @.***> a écrit : … <#m-161106632311626191_> @hedi1920 https://github.com/hedi1920 https://github.com/hedi1920 My last result implementing the same amount of data in both test and train I obtained this result [image: image] https://user-images.githubusercontent.com/40342672/122923465-0ab93680-d365-11eb-8672-bbe4a682d18f.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7 (comment) https://github.com/ZQPei/deep_sort_pytorch/issues/7#issuecomment-865936615>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMLLQP35TA4HA26PLT34KVLTUB56XANCNFSM4GZIBL4A .

@hedi1920 https://github.com/hedi1920 Hello, did you find out the problem and train the deepsort ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ZQPei/deep_sort_pytorch/issues/7#issuecomment-970124867, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMLLQPZ7HQ6QMZX27IU7GOLUMIV2HANCNFSM4GZIBL4A .