[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start

Other

9.8k stars 2.21k forks source link

[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class? #769

Open starkhu opened 4 months ago

starkhu commented 4 months ago

question @jon-barker hello, jon, I have some questions on the embedding, can you help explain? Why replace F.embedding(masked_input, self.weight) with self.weight[masked_input] in forward() function of class VocabParallelEmbedding? What is the difference between them? Why does the F.embedding() can bring 'non-determinism'?

link：https://github.com/NVIDIA/Megatron-LM/blob/core_r0.5.0/megatron/core/tensor_parallel/layers.py#L218

github-actions[bot] commented 2 months ago

Marking as stale. No activity in 60 days.