SalehShmali commented 3 years ago

1- AssertionError: Could not compute output Tensor("rpn_reg_loss/truediv:0", shape=(), dtype=float32) in faster rcnn 2-Incompatible shapes: [8,9216] vs. [8] [[node gradient_tape/reg_loss/mul/BroadcastGradientArgs (defined at e:/faster_rcnn/rpn_trainer.py:60) ]] [Op:__inference_train_function_12539] in rpn alone

davidoygenblik commented 3 years ago

Are you doing this without modifying anything or are you trying to implement your own code?

SalehShmali commented 3 years ago

without modifying anything

FurkanOM commented 3 years ago

Probably tf version mismatch, try tf 2.0 not others.

SalehShmali commented 3 years ago

my tf version is 2.3.1 is that mismatch and i am working with cpu not gpu??

davidoygenblik commented 3 years ago

Yes this doesn’t work in anything besides tf 2.0

Get Outlook for iOShttps://aka.ms/o0ukef

From: Saleh Shmali notifications@github.com Sent: Monday, December 14, 2020 4:37:56 AM To: FurkanOM/tf-faster-rcnn tf-faster-rcnn@noreply.github.com Cc: Oygenblik, David davido@gatech.edu; Comment comment@noreply.github.com Subject: Re: [FurkanOM/tf-faster-rcnn] Erorr in reg_loss (#14)

my tf version is 2.3.1 is that mismatch and i am working with cpu not gpu??

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/FurkanOM/tf-faster-rcnn/issues/14#issuecomment-744312653, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANM2UXYW3ZTHEN3VHGQB75DSUXMHJANCNFSM4UZLRQAA.

davidoygenblik commented 3 years ago

Also you’re gonna have a hell of a long time training it on cpu, on my powerful gpu training on my custom data set takes roughly a day and a half

Get Outlook for iOShttps://aka.ms/o0ukef

From: Oygenblik, David davido@gatech.edu Sent: Monday, December 14, 2020 5:19:46 AM To: FurkanOM/tf-faster-rcnn reply@reply.github.com; FurkanOM/tf-faster-rcnn tf-faster-rcnn@noreply.github.com Cc: Comment comment@noreply.github.com Subject: Re: [FurkanOM/tf-faster-rcnn] Erorr in reg_loss (#14)

Yes this doesn’t work in anything besides tf 2.0

Get Outlook for iOShttps://aka.ms/o0ukef

From: Saleh Shmali notifications@github.com Sent: Monday, December 14, 2020 4:37:56 AM To: FurkanOM/tf-faster-rcnn tf-faster-rcnn@noreply.github.com Cc: Oygenblik, David davido@gatech.edu; Comment comment@noreply.github.com Subject: Re: [FurkanOM/tf-faster-rcnn] Erorr in reg_loss (#14)

my tf version is 2.3.1 is that mismatch and i am working with cpu not gpu??

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/FurkanOM/tf-faster-rcnn/issues/14#issuecomment-744312653, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANM2UXYW3ZTHEN3VHGQB75DSUXMHJANCNFSM4UZLRQAA.

SalehShmali commented 3 years ago

i tried on tf2.0 it was aproblems on tensorflow datasets like this : google.protobuf.json_format.ParseError: Message type "tensorflow_datasets.DatasetInfo" has no field named "downloadSize". Available Fields(except extensions): ['name', 'description', 'version', 'citation', 'sizeInBytes', 'location', 'downloadChecksums', 'schema', 'splits', 'supervisedKeys', 'redistributionInfo']

SalehShmali commented 3 years ago

what version of tfds is suitable?

colindecourt commented 3 years ago

Hi, Using a batch size of 1 I resolved with this issue. I'm looking to make it work with batch size > 1. I'll let you know if I figured out how to make it work for batch size > 1. I'm using tf 2.4.

wr0112358 commented 3 years ago

The batch_size problem with tf 2.4 is the result of a change in the huber loss function: https://github.com/tensorflow/tensorflow/blob/v2.4.0/tensorflow/python/keras/losses.py#L1426 https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/python/keras/losses.py#L885

tf2.0:

tf.losses.Huber(reduction=tf.losses.Reduction.NONE)(tf.ones(shape=(2,16500,4)),tf.ones(shape=(2,16500,4))).shape TensorShape([2, 16500, 4])

tf 2.4:

tf.losses.Huber(reduction=tf.losses.Reduction.NONE)(tf.ones(shape=(2,16500,4)),tf.ones(shape=(2,16500,4))).shape TensorShape([2, 16500])

Fixing the bug for tf 2.4 is easy: Remove the additional reduce_sum in the reg_loss in line 218. https://github.com/FurkanOM/tf-faster-rcnn/blob/master/utils/train_utils.py#L218

ruman1609 commented 2 years ago

Anyone know with Tensorflow 2.8.0, seems the error now change to

ValueError: in user code:    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function  *        return step_function(self, iterator)    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function  **        outputs = model.distribute_strategy.run(run_step, args=(data,))    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step  **        outputs = model.train_step(data)    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 861, in train_step        self._validate_target_and_loss(y, loss)    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 819, in _validate_target_and_loss        'Target data is missing. Your model was compiled with '    ValueError: Target data is missing. Your model was compiled with loss=ListWrapper([None, None, None, None, None, None, None, None, None]), and therefore expects target data to be provided in 'fit()'

colindecourt commented 2 years ago

Hi @ruman1609, Try the solution I proposed in #17 by changing the format of inputs and targets. It seems you don't give target data to the fit function. Otherwise, downgrade your tf version.

ruman1609 commented 2 years ago

Hi @colindecourt I already do like you do, and like @wr0112358 did. Still the error occurred so right now I using 2.1.0 version. The error log was just like I sent before is using TF 2.8.0. That's why I downgrade my TF to 2.1.0

FurkanOM / tf-faster-rcnn

Erorr in reg_loss #14

tf2.0:

tf 2.4: