PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
MIT License
11.25k stars 1.01k forks source link

get_3d_sincos_pos_embed #375

Open Forainest opened 1 month ago

Forainest commented 1 month ago
def  get_3d_sincos_pos_embed(
    embed_dim, grid_size, cls_token=False, extra_tokens=0, interpolation_scale=1.0, base_size=16, 
):

    # if isinstance(grid_size, int):
    #     grid_size = (grid_size, grid_size)
    grid_t = np.arange(grid_size[0], dtype=np.float32) / (grid_size[0] / base_size[0]) / interpolation_scale[0]
    grid_h = np.arange(grid_size[1], dtype=np.float32) / (grid_size[1] / base_size[1]) / interpolation_scale[1]
    grid_w = np.arange(grid_size[2], dtype=np.float32) / (grid_size[2] / base_size[2]) / interpolation_scale[2]
    grid = np.meshgrid(grid_w, grid_h, grid_t)  # here w goes first
    grid = np.stack(grid, axis=0)

    grid = grid.reshape([3, 1, grid_size[2], grid_size[1], grid_size[0]])
    pos_embed = get_3d_sincos_pos_embed_from_grid(embed_dim, grid)
    # import ipdb;ipdb.set_trace()
    if cls_token and extra_tokens > 0:
        pos_embed = np.concatenate([np.zeros([extra_tokens, embed_dim]), pos_embed], axis=0)
    return pos_embed

base_size is int and interpolation_scale is float but when get the value use list[ ]

Forainest commented 1 month ago

same for 2d

LinB203 commented 1 month ago

这是弃用的代码,实际上我们用rope。

xesdiny commented 1 month ago

这是弃用的代码,实际上我们用rope。

Why cross-attn module set use_rope=False,?

LinB203 commented 3 weeks ago

这是弃用的代码,实际上我们用rope。

Why cross-attn module set use_rope=False,?

Text feature in cross-attn naturally carries position information.