for positions ids, why there is a slicing op to index the i-th element given that the size of the first dimension from position ids should always be 1.
Hi, I have the same question for 1, and I'm not sure whether following operation will solve it? like in LlamaAttention class; for question 2., the position id's shape is [bs, seqlen] I think? so it needs a index.
I have several questions concerning the implementation:
https://github.com/astramind-ai/Mixture-of-depths/blob/aff9e74fc9c5a30d2c59dc36767f1f0fd86255e8/MoD/MoD.py#L73
https://github.com/astramind-ai/Mixture-of-depths/blob/aff9e74fc9c5a30d2c59dc36767f1f0fd86255e8/MoD/MoD.py#L70