Closed Kerry-zzx closed 1 year ago
Hi! Thank you for your kind words!
In the code snippet that you highlighted (link to the code) there are no errors. The commented lines are some refuse from some tests that we made. The code comes from the Handwriting Transformer repo (link to the code).
Swapping the tgt
with the query_pos
is a "trick" to avoid applying the first normalization layer to the input to the decoder layers. (link of the code)
def with_pos_embed(self, tensor, pos: Optional[Tensor]):
return tensor if pos is None else tensor + pos
def forward_pre(self, tgt, memory,
tgt_mask: Optional[Tensor] = None,
memory_mask: Optional[Tensor] = None,
tgt_key_padding_mask: Optional[Tensor] = None,
memory_key_padding_mask: Optional[Tensor] = None,
pos: Optional[Tensor] = None,
query_pos: Optional[Tensor] = None):
tgt2 = self.norm1(tgt)
q = k = self.with_pos_embed(tgt2, query_pos)
tgt2 = self.self_attn(q, k, value=tgt2, attn_mask=tgt_mask,
key_padding_mask=tgt_key_padding_mask)[0]
tgt = tgt + self.dropout1(tgt2)
tgt2 = self.norm2(tgt)
tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt2, query_pos),
key=self.with_pos_embed(memory, pos),
value=memory, attn_mask=memory_mask,
key_padding_mask=memory_key_padding_mask)[0]
tgt = tgt + self.dropout2(tgt2)
tgt2 = self.norm3(tgt)
tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))
tgt = tgt + self.dropout3(tgt2)
return tgt
Anyway, I have to thank you for your issue because we figured out that we uploaded the code version with the wrong parameters (eg. decoder learning rate --d_lr
). This is probably the reason why the network didn't converge. Moreover, we changed the random sampling function inside the training loop to speed up the training (from np.random.choice
to random.sample
).
To answer the question regarding the noise. We don't introduce noise during training. You can introduce it during evaluation by passing the parameter --add_noise
to the scripts.
Thank you for taking the time to respond to my question. I appreciate your prompt reply and the clarification you provided. Your insights have been quite helpful in understanding the details of your work.
After reviewing the changes you made to the code, I noticed that the revisions were made in the relevant file, but the train.py file remained unchanged. I understand that updating the train.py file might have been unintentionally overlooked.
To ensure consistency and avoid any potential confusion in the future, it would be beneficial to have the train.py file updated to reflect the modifications made in the code. This will ensure that other researchers or users who refer to your code can have a complete and accurate understanding of the training process.
Once again, I sincerely appreciate your responsiveness and assistance. Your dedication to addressing my queries is commendable.
Sorry, my bad! I've changed the wrong file 😅. Now the train.py
is updated.
Thank you for your swift response and for addressing the issue promptly. I am pleased to hear that the problem has been resolved. I appreciate your dedication and effort in updating the train.py file to reflect the modifications made in the code. This will greatly contribute to the clarity and consistency of the codebase for future users and researchers.
I recently read your paper titled Handwritten Text Generation from Visual Archetype. I found your work on Handwritten Text Generation to be highly informative and interesting. I have a few questions regarding a specific section of your code, and I would greatly appreciate your insights.
In the file model.py, specifically in lines 317-318, I came across the following code snippet:
While studying the code, I was curious about the usage of a zero matrix (torch.zeros_like(QR_EMB)) as the input for the decoder. Based on my experiments, I found that training the network using a zero matrix as the input of the decoder leads to unsuccessful training. Therefore, I wanted to confirm whether this is the correct input configuration you used during the training stage or if there might be an alternative approach.
In my understanding, there could be three possible alternatives: Using hs = self.decoder(QR_EMB, memory, query_pos=query_pos) as the input configuration. Using hs = self.decoder(QR_EMB, memory) as the input configuration. Using hs = self.decoder(QR_EMB, memory,QR_EMB) as the input configuration. Could you please clarify which input configuration you employed during the training stage?
Additionally, I noticed that the default training arguments provided in the code do not mention the addition of noise to the output of the decoder. However, I am intrigued to know if you utilized any noise injection techniques during training. Therefore, I kindly request you to share the training arguments or any relevant information regarding the integration of noise in the decoder output.