google-research / language

Shared repository for open-sourced projects from the Google AI Language team.
https://ai.google/research/teams/language/
Apache License 2.0
1.58k stars 341 forks source link

CANINE Pretraining Code (pt.2) #167

Open stefan-it opened 1 year ago

stefan-it commented 1 year ago

Hi @jhclark-google and @dhgarrette,

I would like to know if there's any chance to get the pretraining code for CANINE.

It's been a long time since the readme was updated and I'm highly interested in pretraining own models on other datasets.

Many thanks in advance!

mwesthelle commented 4 weeks ago

I am also interested in this.

ganeshkrishnan1 commented 3 weeks ago

Any updates on this? I would love to take a look at this since existing wordpiece/sentence piece tokenization doesnt fit our data