Question about TSODA - Githubissues

qqaadir commented 4 years ago

@AlvaroCavalcante

I saw your TSODA.ipynb. Wonderful work. I was wondering if you could please help with some issues I am facing. I am getting this error

Failed to get matching files on /content/tf-models/research/fine_tuned_model/model.ckpt: Not found: /content/tf-models/research/fine_tuned_model; No such file or directory
MODEL TRAINED

I did not find some details about fine_tuned_model folder. Could you please explain why we need this folder and should I manually create it? Does this folder is supposed to be empty from the start?

Also, I could not simulate your original TSODA.ipynb on google colab. I am getting the following error:

ValueError: ssd_inception_v2 is not supported. See `model_builder.py` for features extractors compatible with different versions of Tensorflow
MODEL TRAINED

Could you please help with it? Thanks.

AlvaroCavalcante commented 4 years ago

Hi, @qqaadir tank you for your feedback, I'll try to help you! The fine_tuned_model is a folder that will be used to store your trained model checkpoint to a later inference, for the semi-supervised process. This folder is automatically created during training inside research/object_detection and contains the saved_model folder and all the checkpoints.

You can try to debug your Google Collab by printing each step to check if is everything working correctly, this folder is created in the "iteration process" section on Collab.

Check my original notebook and don't forget to see the logs that were generated when I ran the algorithm, so you can get some tips.

Also check the TF version of your notebook, I used the 2.2 and 1.x in the VM, the version 2.3 was released recently but I don't test, so I recommend to use 2.2.

If you have some news, let me know, I think TSOAD can help you to create labels and test your model performance faster!

qqaadir commented 4 years ago

@AlvaroCavalcante

Thank you for explanation. I tried to use tf 1.15 instead of 2.2. The first error is due to using 1.15 entirely throughout notebook. The second error is due to tf version 2.3 and 1.15 as done in your notebook. I have switched to 2.2 by doing !pip install tensorflow==2.2. By doing that, training runs fine now. How do I find new labelled data, images and xml files? I see both unlabelled_data and labeled_data folders are empty, is this expected? Have all the new and existing files moved to train_images_all and test_images_all?

AlvaroCavalcante commented 4 years ago

I'm happy that it worked! Verify the section "Check some automatically created labels!" in the notebook, your images and XML files may be in the train_images and test_images. The unlabelled_data contains all your images that you don't have label, they are moved to the labeled_data and then distributed into your train and test folder!

qqaadir commented 4 years ago

@AlvaroCavalcante Thank you guidelines. Could you please keep this issue opened until I get some results. I am still in the training process with no success on my own data. I am not facing any errors but the mAP is very low, so I am playing around with parameters.

AlvaroCavalcante commented 4 years ago

Ok! Depending on your problem complexity you will need to fine-tune the parameters. Also, check how many labeled images are you passing to the TSODA, if your problem is very difficult to generalize you probably need to increase your dataset manually before the TSODA gets autonomy to label automatically!

qqaadir commented 4 years ago

@AlvaroCavalcante

I am able to get some results across six different categories using Faster R-CNN. I set train steps size to 35000. But something strange is happening with accuracy, after 35000 steps, accuracy plummets from 0.95 to 0.46. This trend is seen across all classes. Is it because of while loop which trains for extra steps, should I decrease this value and set to it to 1 instead of 5 or should I decrease the train steps size?

AlvaroCavalcante commented 4 years ago

It's difficult to know without more information about your parameters, but a high variance could indicate that your model is suffering overfitting, which explains why your accuracy is changing so much. I recommend you to try to use TensorBoard, so you can check your model loss/accuracy over time. The following code runs TensorBoard on Google Collab:

!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip -o ngrok-stable-linux-amd64.zip

LOG_DIR = model_dir
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)

get_ipython().system_raw('./ngrok http 6006 &')

! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

You will get a link to access your TensorBoard, please use this to check your model evolution!

qqaadir commented 4 years ago

@AlvaroCavalcante

There are no abrupt changes in accuracy during the course of training. It kept on increasing until 35K steps, after that it starts to decrease continuously.

I have been using tensorboard but only after finishing the training. Thanks a lot for this tensorboard trick.

The detection accuracy goes lower after first the iteration of while loop is completed. For now, I have set stop criterion to 2 (i.e., while train_count != 2:) to assess its effect on final detection accuracy.

AlvaroCavalcante commented 4 years ago

So, if I understood you run your model by 35k steps in the first iteration of while loop and gets a high accuracy, and in the next loop, your precision begins to fall right? So I think this is not related to overfitting but the semi-supervised approach, which means your model could be:

Labelling most of your images incorrectly, which degrades the performance in the next training loop
Adding many images in just one iteration, ie: you begin your first loop with just 100 images and your model gets really good in inference them, but in the next loop you add 900 new labeled images and all that your model learned is not so useful to generalize to this new batch of images.

In the first case, I recommend increasing the confidence level, setting it in about 0.9 (90%). With this, only images with a high chance to be right will be labeled and added to the training in the next loop.

In the second case, the confidence level can also help but if you keep training your model it will probably learn the features again and increase your accuracy as happened in the first loop. If it doesn't happen, could mean that your images are not well labeled or you need to change the model parameters.

qqaadir commented 4 years ago

Exactly, you understood the problem right. This is the accuracy at around 20K steps across six different categories.

average_precision: 0.907587
average_precision: 0.763414
average_precision: 0.726314
average_precision: 0.859113
average_precision: 0.975564
average_precision: 0.985202

which is not bad considering there is an issue of class imbalance. Some labels are in smaller amount compared with others.

By confidence level, did you mean change defaultiou_threshold: 0.6 to 0.9 in config file? So I think based on your suggestions I should increase this number train_count != 5: to let the model train longer?

AlvaroCavalcante commented 4 years ago

Yes! A higher iou_threshold ensures that your labels will have less noise because your model needs to be more confident to add a bounding box. If your model is suffering the problem number 2 that I reported, then, increase the train count will be good. To be sure, I recommend you to check your labeled images (to ensure that the new labeled images are good) and also verify the number of images that are being labeled in each loop. If you have any doubts about how TSODA works, try to read again my article, because it's important to debug all the steps that I created to check if your model behavior is normal!

qqaadir commented 4 years ago

@AlvaroCavalcante

The solution to low accuracy is to increase the training time. Increasing train_count (I set a value of 30) working fine for me. Thanks for the suggestions. The following figure shows the mAP at around 30K steps of about 0.81.

Before we close this issue, could you please give your comments on what you think about these results?

AlvaroCavalcante commented 4 years ago

@qqaadir in general, 0.81 of mAP is a great result, but it all depends on your problem, for example, there is a paper with similar work? Compare with the results obtained in other works is a good practice. And also depends on the complexity of your problem, there are multiple objects in the image? The resolution is low? The objects are small? All these variables are important to consider and can answer if your mAP is good.

If you are not happy with your results you can try to increase your dataset and chance the parameters!

qqaadir commented 4 years ago

Thank you for your suggestions. This issue can be closed. I will message here later if need any further help.

AlvaroCavalcante commented 4 years ago

Ok, I'm here to help! :)

AlvaroCavalcante / auto_annotate

Question about TSODA #3