avolkov1 / keras_experiments

Experimental Keras libraries and examples.
The Unlicense
86 stars 16 forks source link

ValueError: Invalid argument "name" passed to K.function with Tensorflow backend during model.fit #1

Closed tRosenflanz closed 7 years ago

tRosenflanz commented 7 years ago

I understand this is experimental code so some issues are obviously expected. Hopefully you can provide any feedback on this:

I think a monkey patch on K.Function has broken something. I am getting an error when trying to run any model (single or multi GPU) after importing some of your code: model.fit(<>): ValueError: Invalid argument "name" passed to K.function with Tensorflow backend

This error is coming from self._make_test_function() method.

I am trying to investigate if my imports were wrong but for now I am not seeing any issues on my side...

I am using Keras 2.05 with Python 2.7

Edit: I have edited _patch_tf_backend.py and added name as one of the attributes. This allowed the non-distributed model to run but the distributed one throws: image

AndrasEros commented 7 years ago

Same issue here, started after I updated Keras to 2.0.5. It was all working fine on 2.0.1. I use Python 3.5, Keras 2.0.5 and tensorflow 1.2. I'm hoping for a fix, it was so great when it worked!

tRosenflanz commented 7 years ago

Oh that helps! This commit https://github.com/fchollet/keras/commit/a736c2632b315d8c6ee54369df89c31695c70591 added name as one of the arguments to K.Function() in one of the latest releases. I will try to go through @avolkov1 code and fix it up to use the name argument in the next few days. If it works, I will create a commit (for the first time in my life)

tRosenflanz commented 7 years ago

So I fixed the errors on my side by: adding argument name=None and self.name=name to keras_exp/_patch_tf_backend.py like this: image

Optional as far as I can tell : adding line kwargs['name']=None to _init_make_dataparallel in keras_exp/multigpu/_multigpu.py like this: image

I can't commit for the next couple days but you can fix this error yourself very easily for now

EDIT: Optional fix is not necessary and is actually wrong! Please ignore.

avolkov1 commented 7 years ago

@Slanothy @AndrasEros Thanks guys. Yea, that monkey patch thing I did was to enable optionally the enqueue_ops that I needed to run, but that's obviously not a very stable patch. The enqueue option uses StagingArea API of tensorflow that I was hoping would improve data slices transfer to GPUs performance. It didn't really make a difference though, but I left the code there for others to tinker with. Other than the enqueue option the monkey patch would not be needed and can be commented out in the file: keras_exp/multigpu/_multigpu.py lines 36 and 37.

To fix it more robustly , and what I should have done to begin with was add **kwargs to the __init__:

def __init__(self, inputs, outputs, updates=None, enqueue_ops=None, name=None, **kwargs):

The **kwargs are used for the Theano backend's Function, I don't know why it's not done for Tensorflow as well, because if that was the case in the future if new arguments are introduced at least the function signature API doesn't have to change. The optional fix: kwargs['name']=None is not appropriate because that name argument corresponds to the model name, not Keras backend Function name. In fact, I don't know that that name parameter is used for in the Function. I didn't see Function instantiated with that name parameter anywhere else in the Keras code base.

@Slanothy In regards to using the mgpu optimizers from here: keras_exp/multigpu/optimizers.py are you passing the gdev_list parameter i.e. something like this:

opt = RMSPropMGPU(lr=0.0001, decay=1e-6, gdev_list=gpus_list)

The error you're citing is "TypeError: 'NoneType' object is not iterable". I don't think the monkey patch or missing name parameter is related to that error. I should raise an error when these specialized optimizers are instantiated if the gdev_list parameter is None. These specialized optimizers average gradients i.e. it's a synchronous multi-gpu implementation, but they are not needed and in fact degrade performance significantly. The parameters are updated asynchronously internally by Tensorflow (some internal Tensorflow magic) since the Keras model layers are shared across the GPUs. Maybe in some cases you might want to average the gradients and explicitly synchronize parameters, but it seems to work fine without it. Test your code with standard optimizers and if that doesn't give you the desired network convergence or results and you think you need to synchronize model parameters explicitly, then see if these specialized optimizers work. But the performance is not as good. I wanted to publish the synchronizing optimizers code so that others might improve on it.

tRosenflanz commented 7 years ago

Yeah, you are absolutely right. The gdev is None was due to me incorrectly using the optimizer and had nothing to do with the monkey patch.

The only thing that was necessary to patch up is to add the name argument and it's initialization but since you are saying it isn't really used, I will follow you suggestions and will comment it out.

Thanks for this module by the way ! It seems to be way better thought out than the make parallel function I was using before

On Jun 16, 2017 5:44 PM, "avolkov1" notifications@github.com wrote:

@Slanothy https://github.com/slanothy @AndrasEros https://github.com/andraseros Thanks guys. Yea, that monkey patch thing I did was to enable optionally the enqueue_ops that I needed to run, but that's obviously not a very stable patch. The enqueue option uses StagingArea API of tensorflow that I was hoping would improve data slices transfer to GPUs performance. It didn't really make a difference though, but I left the code there for others to tinker with. Other than the enqueue option the monkey patch would not be needed and can be commented out in the file: keras_exp/multigpu/_multigpu.py lines 36 and 37.

To fix it more robustly , and what I should have done to begin with was add **kwargs to the init:

def init(self, inputs, outputs, updates=None, enqueue_ops=None, name=None, **kwargs):

The **kwargs are used for the Theano backend's Function, I don't know why it's not done for Tensorflow as well, because if that was the case in the future if new arguments are introduced at least the function signature API doesn't have to change. The optional fix: kwargs['name']=None is not appropriate because that name argument corresponds to the model name, not Keras backend Function name. In fact, I don't know that that name parameter is used for in the Function. I didn't see Function instantiated with that name parameter anywhere else in the Keras code base.

@Slanothy https://github.com/slanothy In regards to using the mgpu optimizers from here: keras_exp/multigpu/optimizers.py are you passing the gdev_list parameter i.e. something like this:

opt = RMSPropMGPU(lr=0.0001, decay=1e-6, gdev_list=gpus_list)

The error you're citing is "TypeError: 'NoneType' object is not iterable". I don't think the monkey patch or missing name parameter is related to that error. I should raise an error when these specialized optimizers are instantiated if the gdev_list parameter is None. These specialized optimizers average gradients i.e. it's a synchronous multi-gpu implementation, but they are not needed and in fact degrade performance significantly. The parameters are updated asynchronously internally by Tensorflow (some internal Tensorflow magic) since the Keras model layers are shared across the GPUs. Maybe in some cases you might want to average the gradients and explicitly synchronize parameters, but it seems to work fine without it. Test your code with standard optimizers and if that doesn't give you the desired network convergence or results and you think you need to synchronize model parameters explicitly, then see if these specialized optimizers work. But the performance is not as good. I wanted to publish the synchronizing optimizers code so that others might improve on it.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/avolkov1/keras_experiments/issues/1#issuecomment-309155682, or mute the thread https://github.com/notifications/unsubscribe-auth/AX4ZrdWXvTxFswTI-urn7eCqciytetwlks5sEwVTgaJpZM4N6ThZ .

avolkov1 commented 7 years ago

@Slanothy Thanks, I'll fix it in my repo as soon as I get a chance to work on this again (should be within a week).

avolkov1 commented 7 years ago

Fixed with commit e65f4e04c55a2e00ecb517ba3dc7d0b8cebc01fd