experiencor / keras-yolo2

Easy training on custom dataset. Various backends (MobileNet and SqueezeNet) supported. A YOLO demo to detect raccoon run entirely in brower is accessible at https://git.io/vF7vI (not on Windows).
MIT License
1.73k stars 787 forks source link

Seg fault when the training starts #384

Open melhousni opened 5 years ago

melhousni commented 5 years ago

Hi, as always seg fault doesnt give that much info on the origin of the error im attempting to train the network with a mobilnet back end on my own dataset

This is my config file

{ "model" : { "backend": "MobileNet", "input_size": 416, "anchors": [5.38,5.63, 5.84,9.08, 6.99,7.80, 7.63,10.30, 8.58,7.33], "max_box_per_image": 10, "labels": ["toy"] },

"train": {
    "train_image_folder":   "/home/mahdi/keras-yolo2/pictures/",
    "train_annot_folder":   "/home/mahdi/keras-yolo2/labels/",

    "train_times":          8,
    "pretrained_weights":   "",
    "batch_size":           16,
    "learning_rate":        1e-4,
    "nb_epochs":            50,
    "warmup_epochs":        3,

    "object_scale":         5.0 ,
    "no_object_scale":      1.0,
    "coord_scale":          1.0,
    "class_scale":          1.0,

    "saved_weights_name":   "",
    "debug":                true
},

"valid": {
    "valid_image_folder":   "",
    "valid_annot_folder":   "",

    "valid_times":          1
}

}

This is the output of train.py

mahdi@amax:~/keras-yolo2$ python train.py -c config.json Using TensorFlow backend. ('Seen labels:\t', {'toy': 867}) ('Given labels:\t', [u'toy']) ('Overlap labels:\t', set(['toy'])) 2018-11-21 18:30:48.932656: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 2018-11-21 18:30:50.990936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:02:00.0 totalMemory: 10.73GiB freeMemory: 10.53GiB 2018-11-21 18:30:50.991011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2018-11-21 18:30:51.414516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-11-21 18:30:51.414569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2018-11-21 18:30:51.414580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2018-11-21 18:30:51.414928: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10168 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5) (13, 13)


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 416, 416, 3) 0


model_1 (Model) (None, 13, 13, 1024) 3228864 input_1[0][0]


DetectionLayer (Conv2D) (None, 13, 13, 30) 30750 model_1[1][0]


reshape_1 (Reshape) (None, 13, 13, 5, 6) 0 DetectionLayer[0][0]


input_2 (InputLayer) (None, 1, 1, 1, 1, 4 0


lambda_1 (Lambda) (None, 13, 13, 5, 6) 0 reshape_1[0][0]
input_2[0][0]

Total params: 3,259,614 Trainable params: 3,237,726 Non-trainable params: 21,888


WARNING:tensorflow:From /home/mahdi/keras-yolo2/frontend.py:232: Print (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2018-08-20. Instructions for updating: Use tf.print instead of tf.Print. Note that tf.print returns a no-output operator that directly prints the output. Outside of defuns or eager mode, this operator will not be executed unless it is directly specified in session.run or used as a control dependency for other operators. This is only a concern in graph mode. Below is an example of how to ensure tf.print executes in graph mode:

    sess = tf.Session()
    with sess.as_default():
        tensor = tf.range(10)
        print_op = tf.print(tensor)
        with tf.control_dependencies([print_op]):
          out = tf.add(tensor, tensor)
        sess.run(out)

Additionally, to use tf.print in python 2.7, users must make sure to import the following:

from __future__ import print_function

Epoch 1/53 Loss XY [0.004202493] Loss WH [4.03024864] Loss Conf [0.126542702] Loss Class [0] Total Loss [14.1609936] Current Recall [0.999999881] Average Recall [0.999999881] Segmentation fault (core dumped)

Any help would be much appreciated. Thanks