CLUEbenchmark / CLUE

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
http://www.CLUEbenchmarks.com
3.96k stars 540 forks source link

pad_token_id不一致 #169

Closed JohnHerry closed 1 year ago

JohnHerry commented 1 year ago

试用了一下roberta_chinese_3L312_clue_tiny, 发现加载起来以后,Tokenizer的pad_token_id是0, 而model.config.pad_token_id是1, 这种情况下我想做长度对齐填充,是填零呢还是一?

YueCongPKU commented 1 year ago

您好! 您的邮件我已收到,我会尽快查看的。谢谢哈!

JohnHerry commented 1 year ago

感谢回复。看样子大概率是0,因为vocab表里[pad]在第一行呢。