Closed bm777 closed 4 years ago
An now still learning:
Hi @bm777 can you please clarify what code are you using, eg what notebook in the chapters are you referring to, and what dataset are you using?
For the loss plots - are they validation or training?
Hi @sidgan, thanks in advance..
After lablised the dataset (with LabelImg), then I generated TFRecord after generated the xml files to train.csv and test.csv (and the map for label also). The dataset that I used now is created by me, it is available in my google_drive Am using the code from recommandation of last line of README.md(the chapter 14) (building a perfect cat locator) code from Tensorflow repo: the loss plot in blue is for validation. for training is here in orange:
# The execution:
python model_main.py --logtostderr\n
--model_dir=training\n
--pipeline_config_path=training/faster_rcnn_resnet152_pets.config
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Binary to run train and evaluation on object detection model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl import flags
import tensorflow.compat.v1 as tf
from object_detection import model_lib
tf.logging.set_verbosity(tf.logging.INFO)
flags.DEFINE_string(
'model_dir', None, 'Path to output model directory '
'where event and checkpoint files will be written.')
flags.DEFINE_string('pipeline_config_path', None, 'Path to pipeline config '
'file.')
flags.DEFINE_integer('num_train_steps', None, 'Number of train steps.')
flags.DEFINE_boolean('eval_training_data', False,
'If training data should be evaluated for this job. Note '
'that one call only use this in eval-only mode, and '
'`checkpoint_dir` must be supplied.')
flags.DEFINE_integer('sample_1_of_n_eval_examples', 1, 'Will sample one of '
'every n eval input examples, where n is provided.')
flags.DEFINE_integer('sample_1_of_n_eval_on_train_examples', 5, 'Will sample '
'one of every n train input examples for evaluation, '
'where n is provided. This is only used if '
'`eval_training_data` is True.')
flags.DEFINE_string(
'checkpoint_dir', None, 'Path to directory holding a checkpoint. If '
'`checkpoint_dir` is provided, this binary operates in eval-only mode, '
'writing resulting metrics to `model_dir`.')
flags.DEFINE_boolean(
'run_once', False, 'If running in eval-only mode, whether to run just '
'one round of eval vs running continuously (default).'
)
flags.DEFINE_integer(
'max_eval_retries', 0, 'If running continuous eval, the maximum number of '
'retries upon encountering tf.errors.InvalidArgumentError. If negative, '
'will always retry the evaluation.'
)
FLAGS = flags.FLAGS
def main(unused_argv):
flags.mark_flag_as_required('model_dir')
flags.mark_flag_as_required('pipeline_config_path')
config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir)
train_and_eval_dict = model_lib.create_estimator_and_inputs(
run_config=config,
pipeline_config_path=FLAGS.pipeline_config_path,
train_steps=FLAGS.num_train_steps,
sample_1_of_n_eval_examples=FLAGS.sample_1_of_n_eval_examples,
sample_1_of_n_eval_on_train_examples=(
FLAGS.sample_1_of_n_eval_on_train_examples))
estimator = train_and_eval_dict['estimator']
train_input_fn = train_and_eval_dict['train_input_fn']
eval_input_fns = train_and_eval_dict['eval_input_fns']
eval_on_train_input_fn = train_and_eval_dict['eval_on_train_input_fn']
predict_input_fn = train_and_eval_dict['predict_input_fn']
train_steps = train_and_eval_dict['train_steps']
if FLAGS.checkpoint_dir:
if FLAGS.eval_training_data:
name = 'training_data'
input_fn = eval_on_train_input_fn
else:
name = 'validation_data'
# The first eval input will be evaluated.
input_fn = eval_input_fns[0]
if FLAGS.run_once:
estimator.evaluate(input_fn,
steps=None,
checkpoint_path=tf.train.latest_checkpoint(
FLAGS.checkpoint_dir))
else:
model_lib.continuous_eval(estimator, FLAGS.checkpoint_dir, input_fn,
train_steps, name, FLAGS.max_eval_retries)
else:
train_spec, eval_specs = model_lib.create_train_and_eval_specs(
train_input_fn,
eval_input_fns,
eval_on_train_input_fn,
predict_input_fn,
train_steps,
eval_on_train_data=False)
# Currently only a single Eval Spec is allowed.
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
# gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.65)
# sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
# with sess.as_default():
if __name__ == '__main__':
gpus = tf.config.experimental.list_physical_devices('GPU')
try:
tf.config.experimental.set_memory_growth(gpus[0], True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
tf.app.run()
except Exception as e:
print(e)
Looks like ResNet152 family of models are not available in TensorFlow 1.x versions which you seem to be using. In other words, you are using a ResNet152 TensorFlow 2.0 model in TensorFlow 1.0 which is not interoperable. For reference TensorFlow updated its Object Detection API to use 2.0 about four weeks ago, you can read more here. Look at the Model Zoo for TensorFlow 1.x and 2.x and it shows all the models that are available for each version.
I'd suggest starting with a faster and reliable model like the ssdlite_mobilenet_v2_coco
model which is available in TensorFlow 1.x Object Detection API.
In case you haven't done already, I'd also recommend first trying on the task and code mentioned in Chapter 14 as that has been tried and tested to work, as of Oct 2019. Then, start adding your dataset and other options as necessary.
Thinking out loud about how a neural network would learn, it would need to see enough number of repeated patterns per class. The collected dataset is a great start, now some images have birds sitting vs flying vs with open wings, and hence having just 75 images, there might be a lack of different repeating patterns making it difficult to learn well. Eventually, it may happen that the detector learns to differentiate the location of a beak or head, etc. So, it ideally needs more patterns to return a high confidence prediction. For classification the dataset would probably suffice and it would learn the color of the beak, feathers, etc.
Exciting work! Running a detector on Nat Geo's video of birds would be rad. I look forward to some cool results :)
Thank you a lot for appreciation of dataset, I sew in the link you gave me, it was the non-interoperability of TensorFlow 2.x model-ZOO on TensorFlow 1.x. thanks :)
I tested the code of chapter 14, it worked very well and the detection of the cat was about 94% :), I was happy...
Based on your recommandation:
First I will used ssdlite_mobile_v2_coco
model and see the result, then try another model like faster_rcnn_inception_v2_coco
which is in TensorFlow 1.x model zoo.
I will add 15 image to reach 90 (with different repeating patterns) image per class.
Thank for appreciation and I will let you know if it work.
Hi @sidgan
I read some paper of object detection related to one stage detector and two stage detector, i sew, there was using more than 900k steps to obtain the best result, for my case i don't know if i got some bad detection because i'm still in 500k steps? Or should I continue to augment the dataset with more image (sitting, flying, open wing and open beak) something like more than 200 image per classe?
I test sslite_mobile_v2_coco
, i get some result after 400k steps.
I tried faster_rcnn_inception_v2_coco
with lr=0.002 between(0-300k steps) and lr=0.0001between(300k-500k steps).
_the plot related to faster_rcnn_inception_v2_coco
model_
Here the plot of the training loss:
Here the plot of the validation loss
but after 400k step, i got some good(with two detection for the single bird) result with a best accuracy ike here:
This looks promising. Great work!!
I've often found myself at the situation when you are - trying to figure out how to increase the accuracy of a model and what methods to employ, to increase data, or to increase the training time. The method that I employ, which I've learned from years in the industry is:
Loop on 2 and 3 until you've attained the desired accuracy.
The website https://paperswithcode.com/sota/object-detection-on-coco is also good for looking at various publications with available code and how they compare against others in benchmarks.
Hi @sidgan , Thank you for your appreciations.
I noticed your recommandation. I will loop on 2 and 3 until I get desired accuracy. you are right, and according to you, it is not the training time which gives a best result, but the couple of some stuff like:
The remaining step for me after your last response:
#1
on Object Detection on COCO minival ),
I will train with lr=0.0001, architecture_model=faster_rcnn_inception_v2_coco
then after EfficientDet
, and compare the result.Thank you again for your help.
**EfficientDet, code
I'm closing this issue now since there has been no activity in the past 2 weeks.
I'm closing this issue now since there has been no activity in the past 2 weeks.
Okay. But I will open it after finished my collection birds dataset if I found some issue.
Many thanks again for your help.
Hello to every one, Am training a bird specie object detection. I have 7(with 525 image, 75 pictures per class) classes and using:
software and libraire
model:
Laptop:
The level: loss:1.4247278, step: 75185 The loss still decreasing but slowly.
I started the training since yesterday and I stopped the training and continued at this morning at the saved checkpoint as you can know. But at this stage, I saved and export the model.*-7108 to test test prediction, but i did not get prediction and no boxes was drawn, only if I decrease the treshold from 0.6 to 0.1, i get some false and true prediction.
My question is: does that means my model was not learned very well? or should I continue the training until I reach 200k step.