Unbabel / OpenKiwi

Open-Source Machine Translation Quality Estimation in PyTorch
https://unbabel.github.io/OpenKiwi/
GNU Affero General Public License v3.0
229 stars 48 forks source link

I suppose that the code comment should be remove. #96

Closed zh25714 closed 3 years ago

zh25714 commented 3 years ago

I suppose that the code comment should be removed.

# token_type_ids[i, t_len : t_len + s_len] = 1 in kiwi/systems/encoders/xlmroberta.py line 424 interleave_input method.

"""
Interleave the source + target embeddings into one tensor.

    This means making the input as [batch, target [SEP] source]. 

"""

captainvera commented 3 years ago

Hello @zh25714 ,

Yeah, you are right, this isn't very clear. In reality this is not a problem because XLMR does not use token_type_ids. That is why it's commented out and the above sentence mentions that XLMR does not use NSP (next sentence prediction). You can also confirm here

This could be made clearer and the code removed. Could you make a PR? We would greatly appreciate it :)

zh25714 commented 3 years ago

Thanks for your reply. I am a student studying QE through your kiwi system. That‘s good achitecture! I will make a PR if it's needed.

------------------ 原始邮件 ------------------ 发件人: "Miguel @.>; 发送时间: 2021年6月3日(星期四) 凌晨2:46 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [Unbabel/OpenKiwi] I suppose that the code comment should be remove. (#96)

Hello @zh25714 ,

Yeah, you are right, this isn't very clear. In reality this is not a problem because XLMR does not use token_type_ids. That is why it's commented out and the above sentence mentions that XLMR does not use NSP (next sentence prediction). You can also confirm here

This could be made clearer and the code removed. Could you make a PR? We would greatly appreciate it :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.