Open lingbai-kong opened 1 year ago
Is there any indispensable reasons to use StackLayers
? Why not just keep this two embedding layers separate?
Keeping two embedding layers separate can solve the problem. But the loss does not decrease without StackLayers
(Not this example). It seems the embedding layers are not considered in the training process.
Would you mind provide a minimum code for reproduction? I remember someone said this issue has solved in issue #916.
Description
When creating the layer with two different embeddings, the variable names of these embeddings are the same, which confuses the
load_weights
process and leads to the prediction error after loading:Unhandled exception. Tensorflow.RuntimeError: Attempting to capture an EagerTensor without building a function.
Reproduction Steps
run the following code:
The variable names of both token_emb and pos_emb are
token_and_position_embedding/embedding/embeddings:0
. Thus, their parameters have the same key name in the saved h5 file. Therefore, when loading weights, thehdf5_format.load_weights_from_hdf5_group
misloads the parameters for pos_emb to the token_emb.Known Workarounds
redefine the pos_emb as follows:
Configuration and Other Information
No response