google-research / multilingual-t5

Apache License 2.0
1.25k stars 129 forks source link

Vocab error unsupported operand type(s) on latest t5 package #48

Closed acul3 closed 3 years ago

acul3 commented 3 years ago

i got vocab error when finetuning using MtfModel

/usr/local/lib/python3.6/dist-packages/t5/data/vocabularies.py in vocab_size(self)
     79   def vocab_size(self) -> int:
     80     """Vocabulary size, including extra ids."""
---> 81     return self._base_vocab_size + self.extra_ids
     82 
     83   @abc.abstractproperty

TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

if i downgrade to package t5==0.7.1 the error gone

here is my MtModel config

model = t5.models.MtfModel(
    model_dir=MODEL_DIR,
    tpu=TPU_ADDRESS,
    tpu_topology=TPU_TOPOLOGY,
    model_parallelism=8,
    batch_size=16,
    sequence_length={"inputs": 512, "targets": 32},
    learning_rate_schedule=0.003,
    save_checkpoints_steps=5000,
    keep_checkpoint_max= None,
    iterations_per_loop=100,
)

model.finetune(
    mixture_or_task_name="xquad_zeroshot",
    pretrained_model_dir=PRETRAINED_DIR,
    finetune_steps=10000
)
stefanondisponibile commented 3 years ago

Encountered the same problem right now, looks like it was a T5 (not mT5) problem, so I issued a PR there.

If you're on a rush, as a temporary workaround I think you may create the sentencepiece vocabulary in the following way:

DEFAULT_VOCAB = t5.data.SentencePieceVocabulary(DEFAULT_SPM_PATH, extra_ids=0)