Object Detection Doesn't Find Any Objects Out of the Box

mawildoer commented 4 years ago

Thanks for all your work and tutorials. It's a great help getting started, but one road block I keep hitting is that your Object Detection example doesn't detect any objects in the test images when trained out of the box.

After looking through the output available, I noticed this message"Fail to load pre-trained weights-starting training from scratch" below the pull from this repo https://github.com/fchollet/deep-learning-models/

Is it possible it's missing getting the pre-weighted network and just trying to train off only two images?

Happy to help put in some of the leg work to help get this one solved.

AIWintermuteAI commented 4 years ago

Yes, I didn't provide any pre-trained models as of now. I probably should make a sample model available for each task (recognition, detection, segmentation). So it does train off with only two images. I can fix that this week or next. Or if you want to help, you can find/make some datasets, train example models and upload them somewhere for people to download. Then I can add link to models and mention you in readme. Overall, in my experience MobileNet7_5 feature extractor can successfully learn up to 10 classes for detection and recognition. Recognition possibly can learn more classes, but for my tasks I only care about top-1 results being correct and I need >80 percent accuracy at least.

mawildoer commented 4 years ago

Thanks Dmitry,

Are we not able to recycle the pre-trained weights from MobileNet or Yolo. Are we not able to use a partially trained network in the same manner?

Otherwise, yes, I'll throw together a dataset perhaps, COCO or VOC and see what I can train in Google CoLab tonight.

It certainly seems like there's no point in training it from the ground up for every case!

AIWintermuteAI commented 4 years ago

Well, there are two parts to networks used in aXelerate: frontend(Segnet, YOLO and Classifier) and backend(feature extractor). By default, if you choose Mobilenet, VGG16, Resnet backends, it will download and load pre-trained weights to feature extractor. But there's no pre-trained weights available for frontend part - the detection layer in case of YOLO. This is why, despite the Mobilenet uses weights pre-trained on ImageNet and outputs correct features you get no detections - because detection_layer hasn't been trained yet and cannot output correct bounding boxes yet.

The last layer configuration depends on dataset you use(number of classes), so there's no point in making pre-trained weights available for it, except for demonstration purposes. Anyways, feel free to experiment :) COCO might be too large for MobileNet, PASCAL-VOC(or subset of it) should be okay.

careyer commented 4 years ago

Hi there... tried the Colab notebook as well but I suffer the same problem. The model trains but in the inference run no detections :-( It just spits out a folder with genuine untouched pictures.

What am I doing wrong? Can you please help?

AIWintermuteAI commented 4 years ago

Which notebook are you trying? And what is the dataset? There was a bug in inference script, which would perform detection as usual (as referenced by "... boxes detected" output), but output the original image instead of the image with boxes. I fixed in the the https://github.com/AIWintermuteAI/aXeleRate/commit/157a5d5d1231cffde07e52848afc6edc298b9bba Can you try again Object Detection notebook now? It works normal for me now. I also tried quick training from scratch on person detection notebook - works as well.

https://colab.research.google.com/drive/1w9QtBAgpJrbbg1Sua9I2NAJIl34CXQHC

Capture

careyer commented 4 years ago

Hi @AIWintermuteAI ,

thank you for your timely answer! I am sorry... I was not being precise enough. I used the "PASCAL-VOC 2012 Object Detection" notebook linked straight from your GitHub project page. I tried to run with the pascal-voc dataset that you have provided as an example. It trains fine and also performs the inference but it was showing only the original pictures without detection boxes.

I will try again now

UPDATE: Today it works! Seems like your fix indeed fixed it =)

THuffam commented 4 years ago

Hi I have tried the updated Colab detection notebook and get the following error when I try it on my own images and annotations files:

` ... Total params: 3,300,614 Trainable params: 3,278,726 Non-trainable params: 21,888

Epoch 1/1

InvalidArgumentError Traceback (most recent call last)

in () 1 from keras import backend as K 2 K.clear_session() ----> 3 model_path = setup_training(config_dict=config) 9 frames /tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py in __call__(self, *args, **kwargs) 1470 ret = tf_session.TF_SessionRunCallable(self._session._session, 1471 self._handle, args, -> 1472 run_metadata_ptr) 1473 if run_metadata: 1474 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr) InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Input to reshape is a tensor with 18620 values, but the requested shape requires a multiple of 128 [[{{node loss/reshape_1_loss/loss_func/Reshape}}]] [[Mean/_1411]] (1) Invalid argument: Input to reshape is a tensor with 18620 values, but the requested shape requires a multiple of 128 [[{{node loss/reshape_1_loss/loss_func/Reshape}}]] 0 successful operations. 0 derived errors ignored. ` I get the same error when I try both the pre-trained and train-from-scratch configs. Also, could you please explain the weights (ie where did the 2020-04-12_17-09-43.h5 file come from?) and pre-trained vs scratch-trained models? Is the use of pretrained models the same as transfer learning? I am new to ML, but my understanding is that for transfer learning you need the original model eg imagenet or mobilenet - so I'm not sure what the 2020-04-12_17-09-43.h5 file is for? Many thanks Tim

AIWintermuteAI commented 4 years ago

Welcome to dark forests of ML! To greet you, we can start with discussion on transfer learning and fine-tunning :) The way it usually described, transfer learning is using pre-trained feature extractor model (sometimes referenced as "bottleneck") to train new "head" of model. Fine-tuning refers to re-training a whole model for a changed dataset.

Now, for aXeleRate - the way it's done now, if you choose MobileNet, VGG16, Resnet50 architectures, you're doing transfer learning by default - feature extractor will use imagenet weights as default setting. In future (about a week in future :) ) that will change and you will be able to pass custom bottleneck weights in config file or choose to start training with random weights.

2020-04-12_17-09-43.h5 is full model trained on PASCAL-20 dataset, it is there to provide users with reference on how trained model performs. So, you can perform inference with it, continue training on PASCAL-VOC or use on your dataset (!!!only if you have same number of classes, 20!!!).

If you have different number of classes you can start training detector layer from scratch (by NOT specifying full model weights) - in that case, aXeleRate automatically load imagenet weights to feature extractor.

How many validations(and training) samples do you have? Normally this error shows if you have less samples than batch_size. For example, having batch size of 32 and just 10 validation samples will result in this error.

AIWintermuteAI commented 4 years ago

@careyer excellent! thank you for the feedback. This project is work in progress and I'm coding late at night :) I do testing with tests_training.py script every time before pushing update, sometimes a little bug sneaks through.

careyer commented 4 years ago

@THuffam @AIWintermuteAI : I am also a newbie and try to get a foothold in this topic. I would also much appreciate if the questions Tim asked could be answered! Thanks!

THuffam commented 4 years ago

Awesome - thanks @AIWintermuteAI Oh yes I missed the batch size parameter - yes I only had 20 training and 6 validation examples. I have over 1000 images that I have yet to do the annotations on - but I wanted to make sure I could get the whole system working before proceeding.

When running it on my local machine with a batch size of 4 it ran fine (although the validation did not yield any positive results - I'm assuming because of the small training batch).

I appreciate that it's not your job to educate us on ML - but so we can better understand the Axelerate architecture and process flow, could you recommend any articles or online courses that might help us understand it? I'm about 1/2 way through the Fast.AI course and have done various proof-of-concepts using some AWS, GCP and Azure ML offerings (that was about 4 or 5 years ago) - but I must admit I'm somewhat lost here - but keen to learn and get it nailed.

Thanks again - hopefully we'll be able to get involved and help out at some point. Tim

careyer commented 4 years ago

@THuffam: 👍 I would also like to put together my own training set... can you please point me into the right direction how to do all those annotations? I mean I somehow need a tool to create all those .xml files. Is there a tool for either windows or linux readily available that can streamline the workflow? Thanks!

AIWintermuteAI commented 4 years ago

@careyer of course, there are plenty of tools. I used labelImg for simplicity https://github.com/tzutalin/labelImg There's also this article on hackster where I describe how to simplify(somewhat) the process of creating a dataset with OpenCV script. That won't work for every scenario though. https://www.hackster.io/dmitrywat/deep-learning-sumo-robot-e97313 Finally, there are a few annotation tools available that support using Deep Learning for automatic annotation - the idea is that you can use half-trained model to detect the boxes, correct them, train again, rinse and repeat :)

AIWintermuteAI commented 4 years ago

@THuffam Right. I think I'll add a check in the script to detect if user has less samples than batch_size - I saw other people encountering similar problem. Well, I'm not an authority in any way on Deep Learning :) I mean don't have PhD in CS. I did one course on Udacity recently, but just because I wanted to learn more about PyTorch. Everything else I learned from tutorials, articles and writing code myself. Machine Learning is a huge field there's just too much stuff to learn. I myself have chosen to concentrate on computer vision and (a bit) reinforcement learning for robotics. So, I am ignoring all other amazing things, like GANs and deep learning for NLP. Just 24 hours in one day. If you want to learn more about CV, I did write a few articles on instructables and hackster, that explain YOLO and transfer learning in easy language: https://www.instructables.com/id/Object-Detection-With-Sipeed-MaiX-BoardsKendryte-K/ https://www.instructables.com/id/Transfer-Learning-With-Sipeed-MaiX-and-Arduino-IDE/ Go to step 6 in both of these articles to see the references - there is definitely more reading to be done if you want to understand how CNNs work :) Other than that it's really all about hands-on experience and sleepless night coding and debugging your code.

AIWintermuteAI commented 4 years ago

@careyer @THuffam by the way, since your original problem is solved I'll close this issue. It is better to keep it "one issue - one topic", easier for other people to find information. If you encounter other problems, please don't hesitate to open another issue. For general questions about ML, write me a comment on YouTube - I have "zero comments not replied" policy there xD

AIWintermuteAI / aXeleRate

Object Detection Doesn't Find Any Objects Out of the Box #2

Epoch 1/1