Question about the Embedding Module

ChaozhongLiu commented 1 week ago

Hi there,

Thanks for the awesome work! As I'm learing the pre-training details, I got one question that could not be solved with the code currently available in this Repo:

In Supplementary Note 8, "Zero value was directly converted into a randomly initialized embedding", but in the corresponding code, only MASK and PAD are specially treated, not the zero values.

So how was this actually implemented? I suppose zero shared the same token as pad? But I don't know what are the mask_token_id and pad_token_id .

Would be great if you can provide any detail! Thanks!

WhirlFirst commented 5 days ago

Hi, In our implementation, zero does not share the same token as the pad; it shares the same process with non-zero values. However, since the value is 0, the weight in the MLP is not activated, leaving the bias vector as the output for the subsequent calculation. Therefore, zero embeddings can be regarded as randomly initialized embeddings.

ChaozhongLiu commented 4 days ago

Thanks! That makes sense.

biomap-research / scFoundation

Question about the Embedding Module #34