dhlab-epfl / dhSegment

Generic framework for historical document processing
https://dhlab-epfl.github.com/dhSegment
GNU General Public License v3.0
373 stars 116 forks source link

Error on training #35

Closed calebjacksonhoward closed 5 years ago

calebjacksonhoward commented 5 years ago

Hello,

I'm having difficulty with training. I was able to train with the demo data, but when I submit my own data I get the following error:

Traceback (most recent calls WITHOUT Sacred internals):
  File "/home/caleb/Work/.../env_dhSegment/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/caleb/Work/.../env_dhSegment/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/caleb/Work/.../env_dhSegment/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must be broadcastable: logits_size=[451801,7] labels_size=[451248,7]
     [[{{node loss/per_pixel_loss}} = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](loss/per_pixel_loss/Reshape, loss/per_pixel_loss/Reshape_1)]]
     [[{{node Loss_1/map/while/Switch_1/_4839}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5009_Loss_1/map/while/Switch_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopLoss_1/map/while/TensorArrayReadV3_1/_4715)]]

There are 7 classes in my data, and only 2 in the demo data. Apart from that, I can see no obvious distinctions between the two data sets.

I am coming up to speed with dhSegment, and some of the supporting libraries, and will appreciate any and all assistance greatly.

calebjacksonhoward commented 5 years ago

Found my issue - the label data was off by a pixel in either dimension in a subset of my training data.

(Incidentally, I was building my label images based on the XML data out of the OpenLabeling tool. Some of those XML files were off by a pixel in each dimension in reporting the size of the source image. If you are using OpenLabeling, check the image size.)