THUDM / icetk

A unified tokenization tool for Images, Chinese and English.
150 stars 17 forks source link

Tokenizer cant be hashed when using datastes.map function #8

Open dumpmemory opened 1 year ago

dumpmemory commented 1 year ago

the tokenizer cant be hashed when using datasets.map function with num_proc >1 .

https://github.com/THUDM/ChatGLM-6B/issues/286

danyang-rainbow commented 1 year ago

same problem, can anyone help to solve this?

Sleepychord commented 1 year ago

https://stackoverflow.com/questions/55344376/how-to-import-protobuf-module Seems like protobuf is not picklable. I will look into it in next few days.