if args.item_indexing == 'collaborative':
for ds in train_loader.dataset.datasets:
tokenizer.add_tokens(ds.new_token)
the new tokens are not sorted. It will cause randomly order of token IDs for the newly added tokens. Thus, when loading the model from line 200:
if args.load:
if local_rank == 0:
logging.info(f"Load model from {args.model_path}")
model = utils.load_model(model, args.model_path, args, loc=device)
model.to(device)
the new token embedding may not match in the training process.
Probably it will not influence this repo results, but users need to be very careful when using this (either keep the same order of added tokens, or sort the new tokens before adding).
In main.py, line 190:
the new tokens are not sorted. It will cause randomly order of token IDs for the newly added tokens. Thus, when loading the model from line 200:
the new token embedding may not match in the training process.