Closed neil-yc closed 4 years ago
Hi @Nicholasyc ! Both MMoE and tower layer s are single layers in this example. There are 8 experts and 4 hidden units per expert and the number of task here is basically defining how many gate kernels I need to use to weight all of the experts. Of course, the tower layers are just standard Dense layers in Keras and you can modify it pretty easily.
Thx @alvin319 , and i found you use K.tf.tensordot in experts_output while K.dot in gate_output, i wonder if they do the same thing? https://github.com/drawbridge/keras-mmoe/blob/09bb267cb4f38d624286bf7b68961b3610f0c082/mmoe.py#L180
Here are the documentations of these two functions.
https://www.tensorflow.org/api_docs/python/tf/tensordot https://github.com/keras-team/keras/blob/master/keras/backend/tensorflow_backend.py#L1306
Tensordot allows performing dot product along specific axes which is what I need for the expert kernel.
Got that, infact, tensordot equals to matmul when axes=1 (in your code) which is the same as K.dot
What is num_task here.
https://github.com/drawbridge/keras-mmoe/blob/09bb267cb4f38d624286bf7b68961b3610f0c082/census_income_demo.py#L177 Hi, thx for your code, and i wanna confirm some questions, both the mmoe and tower layers are single-layer? and the size of mmoe hidden layer is 4 , the size of tower layer is 2 ?