Some questions about mmoe implementation

drawbridge / keras-mmoe

A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)

MIT License

681 stars 217 forks source link

Some questions about mmoe implementation #3

Closed neil-yc closed 4 years ago

neil-yc commented 4 years ago

https://github.com/drawbridge/keras-mmoe/blob/09bb267cb4f38d624286bf7b68961b3610f0c082/census_income_demo.py#L177 Hi, thx for your code, and i wanna confirm some questions, both the mmoe and tower layers are single-layer? and the size of mmoe hidden layer is 4 , the size of tower layer is 2 ?

alvin319 commented 4 years ago

Hi @Nicholasyc ! Both MMoE and tower layer s are single layers in this example. There are 8 experts and 4 hidden units per expert and the number of task here is basically defining how many gate kernels I need to use to weight all of the experts. Of course, the tower layers are just standard Dense layers in Keras and you can modify it pretty easily.

neil-yc commented 4 years ago

Thx @alvin319 , and i found you use K.tf.tensordot in experts_output while K.dot in gate_output, i wonder if they do the same thing？ https://github.com/drawbridge/keras-mmoe/blob/09bb267cb4f38d624286bf7b68961b3610f0c082/mmoe.py#L180

alvin319 commented 4 years ago

Here are the documentations of these two functions.

https://www.tensorflow.org/api_docs/python/tf/tensordot https://github.com/keras-team/keras/blob/master/keras/backend/tensorflow_backend.py#L1306

Tensordot allows performing dot product along specific axes which is what I need for the expert kernel.

neil-yc commented 4 years ago

Got that, infact, tensordot equals to matmul when axes=1 (in your code) which is the same as K.dot

jayvyas92 commented 4 years ago

What is num_task here.