document - trainer, BUG - not masking out the pad tokens i.e.

D4L-Pigeons / D4L-Hackaton

1 stars 0 forks source link

document - trainer, BUG - not masking out the pad tokens i.e. #52

Open pszmk opened 2 months ago

pszmk commented 2 months ago

https://github.com/D4L-Pigeons/D4L-Hackaton/blob/c309d2f5d4455e930acd132d398ec808658522b1/src/models/components/condition_embedding.py#L272

the padding structure might be established with batch[cond_ids_name] where 0 denotes PAD

pszmk commented 2 months ago

lack of masking unnecessarily puts attention to pad tokens - although their embedding is 0 vector it is not a common practoce (it seems to me) to leave pads unmasked. a strange way of counteracting the softmax weight addad to 0 cosine similarityes would be to change temperature but I am pretty confident that masking pads it the usual way to go - sounds natural