Open whaleloops opened 1 year ago
I noticed that kMaskWordTokenId (mask2 as defined in the paper) is 3 as defined below. https://github.com/google-research/pegasus/blob/main/pegasus/ops/pretrain_parsing_ops.cc#L69
However, the id of token 'a' is also 3 in sentencepiece vocab from "gs://t5-data/vocabs/cc_all.32000/sentencepiece.model"
@EKebriaei
I noticed that kMaskWordTokenId (mask2 as defined in the paper) is 3 as defined below. https://github.com/google-research/pegasus/blob/main/pegasus/ops/pretrain_parsing_ops.cc#L69
However, the id of token 'a' is also 3 in sentencepiece vocab from "gs://t5-data/vocabs/cc_all.32000/sentencepiece.model"
@EKebriaei