如何获得1024维特征的离散id

wcr369 commented 6 months ago

请问有codebook在ckpt里吗，没有找到

LiuShixing commented 6 months ago

你用torch.load看看，应该是有的，推理的时候好像不会载入

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: Chenrun Wang @.> 发送时间: 2024年2月18日 23:08 收件人: TencentGameMate/chinese_speech_pretrain @.> 抄送: Subscribed @.***> 主题: Re: [TencentGameMate/chinese_speech_pretrain] 如何获得1024维特征的离散id (Issue #47)

请问有codebook在ckpt里吗，没有找到

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

wcr369 commented 6 months ago

@LiuShixing 我找到model.label_embs_concat.data是一个（504，768）的tensor应该是codebook，请问一下那我应该怎么把model.extract_features输出的1024维的特征映射到768维上呢

LiuShixing commented 6 months ago

细节我不记得了，获取原始音频的离散id，wav2vec需要在codebook后获取id，hubert是用kmeans模型获取，如果kmeans模型没有开源的话，就取不到了。如果是把transformer生成的结果映射到id的话，需要自己根据相识度判断。关于768 和1024两个纬度，我记得训练的时候是有个project层转换，然后才计算loss

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: Chenrun Wang @.> 发送时间: 2024年2月19日 09:51 收件人: TencentGameMate/chinese_speech_pretrain @.> 抄送: LiuShixing @.>, Mention @.> 主题: Re: [TencentGameMate/chinese_speech_pretrain] 如何获得1024维特征的离散id (Issue #47)

@LiuShixing 我找到model.label_embs_concat.data是一个（504，768）的tensor应该是codebook，请问一下那我应该怎么把model.extract_features输出的1024维的特征映射到768维上呢

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

zhanghanweii commented 3 months ago

请问你是怎么解决的呢

TencentGameMate / chinese_speech_pretrain

如何获得1024维特征的离散id #47