New tokens are not sorted, causing new token embedding incorrectly match when loading the model

agiresearch / OpenP5

OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender Systems

Apache License 2.0

256 stars 20 forks source link

In main.py, line 190:

    if args.item_indexing == 'collaborative':
        for ds in train_loader.dataset.datasets:
            tokenizer.add_tokens(ds.new_token)

the new tokens are not sorted. It will cause randomly order of token IDs for the newly added tokens. Thus, when loading the model from line 200:

    if args.load:
        if local_rank == 0:
            logging.info(f"Load model from {args.model_path}")
        model = utils.load_model(model, args.model_path, args, loc=device)
        model.to(device)

the new token embedding may not match in the training process.

agiresearch / OpenP5

New tokens are not sorted, causing new token embedding incorrectly match when loading the model #4