ematvey / hierarchical-attention-networks

Document classification with Hierarchical Attention Networks in TensorFlow. WARNING: project is currently unmaintained, issues will probably not be addressed.
MIT License
462 stars 147 forks source link

Are uw and us global weights? just to conform. #18

Open acadTags opened 6 years ago

acadTags commented 6 years ago

Thank you ematvey for this paper.

I wonder the uw and us are two vectors as global weights, or there are different uw(s) for each sentence, and different us(s) for each document?

From the code I think these are global vectors, am I right? Please help me confirm this.

As in the model_components.py it is said

Performs task-specific attention reduction, using learned attention context vector (constant within task of interest).

The uw or us are defined in the function task_specific_attention(), although they are both referred to the attention_context_vector, but in the computational graph, are they different vectors? It would be helpful if you could explain a little about this part.

attention_context_vector = tf.get_variable(name='attention_context_vector', shape=[output_size], initializer=initializer, dtype=tf.float32)

Thank you.

dugarsumit commented 6 years ago

I believe you are right. Uw and Us are the global context vectors that stores information about which words or sentences are most informative respectively. They are learned during the training process.