Open ghost opened 3 years ago
Did you use byte_fallback? Are you installing t5 from github head or from pip? The extra_id change hasn't been pushed into the pip package yet.
On Tue, Jan 5, 2021, 7:21 AM antoniomastro1996 notifications@github.com wrote:
Hi guys,
I trained from scratch a new sentencepiece model on my pretraining dataset, however I still get unk tokens. Do you know why? I remember the last summer was working smoothly! Specifically: ⁇ extra_id_0> ⁇ @ ⁇ extra_id_1> Furthermore, the same ?? is present even for curly brace '{'
Thanks in advance
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google-research/text-to-text-transfer-transformer/issues/634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIJV2E67MIGO7GGNAY2HWDSYL725ANCNFSM4VVAHNUQ .
@adarob Hi Rob, thanks a lot for the reply. I installed T5 from pip. I'm adapting your Jupiter notebook and nope, I didn't use byte_fallback. I mean I just created a new sentencepiece model as usual with the standard parameters.
Hi guys,
I trained from scratch a new sentencepiece model on my pretraining dataset, however I still get unk tokens. Do you know why? I remember the last summer was working smoothly! Specifically: ⁇ extra_id_0> ⁇ @ ⁇ extra_id_1> Furthermore, the same ?? is present even for curly brace '{'
Thanks in advance