UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.44k stars 2.5k forks source link

[Fix] Resolve loading private Transformer model in version 3.3.0 #3058

Closed pesuchin closed 2 weeks ago

pesuchin commented 2 weeks ago

Resolves: https://github.com/UKPLab/sentence-transformers/issues/3053

Details

Check

I have confirmed that I can load a private Transformer model with an adapter with the following code.

code:

from sentence_transformers.models.Transformer import Transformer

args = {
    "token": <AUTH TOKEN>,
    "trust_remote_code": False,
    "revision": None,
    "local_files_only": False
}

transformer = Transformer(
    model_name_or_path=<PRIVATE MODEL PATH>,
    cache_dir=None,
    backend="torch",
    max_seq_length=512,
    do_lower_case=True,
    model_args=args,
    tokenizer_args=args,
    config_args=args
)

logs:

model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████| 2.27G/2.27G [01:17<00:00, 24.1MB/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 1.17k/1.17k [00:00<00:00, 1.03MB/s]
sentencepiece.bpe.model: 100%|████████████████████████████████████████████████████████████████████████████████| 5.07M/5.07M [00:00<00:00, 9.80MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 17.1M/17.1M [00:00<00:00, 24.9MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████| 964/964 [00:00<00:00, 1.87MB/s]
tomaarsen commented 2 weeks ago

Hello!

I'm not going to add a test for this for now - it would just be a bit too messy to have a private model that everyone can somehow access, but this does seem to resolve the problem. Also, I could have sworn that dict.get() gave a KeyError by default if it failed, but I suppose not, haha.

I updated the default for local_files_only to False, as that's the default in find_adapter_config_file as well - the others can stay None.

Thanks for tackling this so quickly @pesuchin, I'd like to quickly merge this & bring it out in a patch release.

cc @J-Curwell, @HenningDinero thanks for reporting this!

HenningDinero commented 2 weeks ago

Hello!

I'm not going to add a test for this for now - it would just be a bit too messy to have a private model that everyone can somehow access, but this does seem to resolve the problem. Also, I could have sworn that dict.get() gave a KeyError by default if it failed, but I suppose not, haha.

I updated the default for local_files_only to False, as that's the default in find_adapter_config_file as well - the others can stay None.

Thanks for tackling this so quickly @pesuchin, I'd like to quickly merge this & bring it out in a patch release.

cc @J-Curwell, @HenningDinero thanks for reporting this!

  • Tom Aarsen

It is indeed difficult to make a test to a private-public repo 😅 But yeah, dict.get does not raise an error (use dict[] instead if you want ;-))

J-Curwell commented 2 weeks ago

Thank you for resolving so quickly @pesuchin @tomaarsen! 🥇