flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.81k stars 2.09k forks source link

TARS zero-shot classification tutorial: RepositoryNotFoundError #3236

Open vemchance opened 1 year ago

vemchance commented 1 year ago

I've been following the tutorial for few-shot and zero-shot classification with TARS (found here: https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_10_TRAINING_ZERO_SHOT_MODEL.md).

However, when I get to use case 3 (train a TARS model) I get a HTTP error/repository not found error. The model completes and saves several files, but I think it stops somewhere at evaluation. Even with final-model.pt files saved, I cannot load these as I get the same error. There has been a similar issue here before, I've tried those steps (e.g., checking the directory) but no change. The exact code as in the tutorial produces the error for me (without modifying the directories), as well as attempts to modify the directory and other suggestions I've seen from two similar issues.

I feel like I'm missing something extremely simple, but can't quite figure out where I'm going wrong. I think it's related to bert-base-uncased.

Full trace back below, any help appreciated!


HTTPError Traceback (most recent call last) File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\utils_errors.py:259, in hf_raise_for_status(response, endpoint_name) 258 try: --> 259 response.raise_for_status() 260 except HTTPError as e:

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\models.py:1021, in Response.raise_for_status(self) 1020 if http_error_msg: -> 1021 raise HTTPError(http_error_msg, response=self)

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

RepositoryNotFoundError Traceback (most recent call last) File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\transformers\utils\hub.py:409, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash) 407 try: 408 # Load from URL or cache if already cached --> 409 resolved_file = hf_hub_download( 410 path_or_repo_id, 411 filename, 412 subfolder=None if len(subfolder) == 0 else subfolder, 413 revision=revision, 414 cache_dir=cache_dir, 415 user_agent=user_agent, 416 force_download=force_download, 417 proxies=proxies, 418 resume_download=resume_download, 419 use_auth_token=use_auth_token, 420 local_files_only=local_files_only, 421 ) 423 except RepositoryNotFoundError:

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\utils_validators.py:120, in validate_hf_hub_args.._inner_fn(*args, *kwargs) 118 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name, has_token=has_token, kwargs=kwargs) --> 120 return fn(args, **kwargs)

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\file_download.py:1160, in hf_hub_download(repo_id, filename, subfolder, repo_type, revision, library_name, library_version, cache_dir, local_dir, local_dir_use_symlinks, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout) 1159 try: -> 1160 metadata = get_hf_file_metadata( 1161 url=url, 1162 token=token, 1163 proxies=proxies, 1164 timeout=etag_timeout, 1165 ) 1166 except EntryNotFoundError as http_error: 1167 # Cache the non-existence of the file and raise

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\utils_validators.py:120, in validate_hf_hub_args.._inner_fn(*args, *kwargs) 118 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name, has_token=has_token, kwargs=kwargs) --> 120 return fn(args, **kwargs)

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\file_download.py:1501, in get_hf_file_metadata(url, token, proxies, timeout) 1492 r = _request_wrapper( 1493 method="HEAD", 1494 url=url, (...) 1499 timeout=timeout, 1500 ) -> 1501 hf_raise_for_status(r) 1503 # Return

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\utils_errors.py:291, in hf_raise_for_status(response, endpoint_name) 283 message = ( 284 f"{response.status_code} Client Error." 285 + "\n\n" (...) 289 " make sure you are authenticated." 290 ) --> 291 raise RepositoryNotFoundError(message, response) from e 293 elif response.status_code == 400:

RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-645b51f2-1a84ec225d903fc732d0544a)

Repository Not Found for url: https://huggingface.co/None/resolve/main/tokenizer_config.json. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated. Invalid username or password.

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last) Input In [6], in <cell line: 43>() 40 trainer = ModelTrainer(tars, corpus) 42 # 8. start the training ---> 43 trainer.train(base_path='./resources/taggers/trec', # path to store the model artifacts 44 learning_rate=0.02, # use very small learning rate 45 mini_batch_size=16, 46 mini_batch_chunk_size=4, # optionally set this if transformer is too much for your machine 47 max_epochs=1, 48 embeddings_storage_mode='none')

File ~\AppData\Roaming\Python\Python39\site-packages\flair\trainers\trainer.py:893, in ModelTrainer.train(self, base_path, learning_rate, mini_batch_size, eval_batch_size, mini_batch_chunk_size, max_epochs, train_with_dev, train_with_test, monitor_train, monitor_test, main_evaluation_metric, scheduler, anneal_factor, patience, min_learning_rate, initial_extra_patience, optimizer, cycle_momentum, warmup_fraction, embeddings_storage_mode, checkpoint, save_final_model, anneal_with_restarts, anneal_with_prestarts, anneal_against_dev_loss, batch_growth_annealing, shuffle, param_selection_mode, write_weights, num_workers, sampler, use_amp, amp_opt_level, eval_on_train_fraction, eval_on_train_shuffle, save_model_each_k_epochs, tensorboard_comment, use_swa, use_final_model_for_eval, gold_label_dictionary_for_eval, exclude_labels, create_file_logs, create_loss_file, epoch, use_tensorboard, tensorboard_log_dir, metrics_for_tensorboard, optimizer_state_dict, scheduler_state_dict, save_optimizer_state, reduce_transformer_vocab, shuffle_first_epoch, **kwargs) 891 # test best model if test data is present 892 if self.corpus.test and not train_with_test: --> 893 final_score = self.final_test( 894 base_path=base_path, 895 eval_mini_batch_size=eval_batch_size, 896 num_workers=num_workers, 897 main_evaluation_metric=main_evaluation_metric, 898 gold_label_dictionary_for_eval=gold_label_dictionary_for_eval, 899 exclude_labels=exclude_labels, 900 ) 901 else: 902 final_score = 0

File ~\AppData\Roaming\Python\Python39\site-packages\flair\trainers\trainer.py:1015, in ModelTrainer.final_test(self, base_path, eval_mini_batch_size, main_evaluation_metric, num_workers, gold_label_dictionary_for_eval, exclude_labels) 1012 self.model.eval() 1014 if (base_path / "best-model.pt").exists(): -> 1015 self.model.load_state_dict(self.model.load(base_path / "best-model.pt").state_dict()) 1016 else: 1017 log.info("Testing using last state of model ...")

File ~\AppData\Roaming\Python\Python39\site-packages\flair\models\tars_model.py:928, in TARSClassifier.load(cls, model_path) 924 @classmethod 925 def load(cls, model_path: Union[str, Path, Dict[str, Any]]) -> "TARSClassifier": 926 from typing import cast --> 928 return cast("TARSClassifier", super().load(model_path=model_path))

File ~\AppData\Roaming\Python\Python39\site-packages\flair\models\tars_model.py:323, in FewshotClassifier.load(cls, model_path) 319 @classmethod 320 def load(cls, model_path: Union[str, Path, Dict[str, Any]]) -> "FewshotClassifier": 321 from typing import cast --> 323 return cast("FewshotClassifier", super().load(model_path=model_path))

File ~\AppData\Roaming\Python\Python39\site-packages\flair\nn\model.py:559, in Classifier.load(cls, model_path) 555 @classmethod 556 def load(cls, model_path: Union[str, Path, Dict[str, Any]]) -> "Classifier": 557 from typing import cast --> 559 return cast("Classifier", super().load(model_path=model_path))

File ~\AppData\Roaming\Python\Python39\site-packages\flair\nn\model.py:191, in Model.load(cls, model_path) 189 if not isinstance(model_path, dict): 190 model_file = cls._fetch_model(str(model_path)) --> 191 state = load_torch_state(model_file) 192 else: 193 state = model_path

File ~\AppData\Roaming\Python\Python39\site-packages\flair\file_utils.py:359, in load_torch_state(model_file) 355 # load_big_file is a workaround byhttps://github.com/highway11git 356 # to load models on some Mac/Windows setups 357 # see https://github.com/zalandoresearch/flair/issues/351 358 f = load_big_file(model_file) --> 359 return torch.load(f, map_location="cpu")

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\serialization.py:809, in load(f, map_location, pickle_module, weights_only, pickle_load_args) 807 except RuntimeError as e: 808 raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None --> 809 return _load(opened_zipfile, map_location, pickle_module, pickle_load_args) 810 if weights_only: 811 try:

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\serialization.py:1172, in _load(zip_file, map_location, pickle_module, pickle_file, pickle_load_args) 1170 unpickler = UnpicklerWrapper(data_file, pickle_load_args) 1171 unpickler.persistent_load = persistent_load -> 1172 result = unpickler.load() 1174 torch._utils._validate_loaded_sparse_tensors() 1176 return result

File ~\AppData\Roaming\Python\Python39\site-packages\flair\embeddings\transformer.py:1162, in TransformerEmbeddings.setstate(self, state) 1159 config_class = CONFIG_MAPPING[model_type] 1160 config = config_class.from_dict(config_state_dict) -> 1162 embedding = self.create_from_state(saved_config=config, **state) 1164 # copy values from new embedding 1165 for key in embedding.dict.keys():

File ~\AppData\Roaming\Python\Python39\site-packages\flair\embeddings\document.py:61, in TransformerDocumentEmbeddings.create_from_state(cls, state) 57 @classmethod 58 def create_from_state(cls, state): 59 # this parameter is fixed 60 del state["is_document_embedding"] ---> 61 return cls(**state)

File ~\AppData\Roaming\Python\Python39\site-packages\flair\embeddings\document.py:47, in TransformerDocumentEmbeddings.init(self, model, layers, layer_mean, is_token_embedding, kwargs) 30 def init( 31 self, 32 model: str = "bert-base-uncased", # set parameters with different default values (...) 36 kwargs, 37 ): 38 """ 39 Bidirectional transformer embeddings of words from various transformer architectures. 40 :param model: name of transformer model (see https://huggingface.co/transformers/pretrained_models.html for (...) 45 :param fine_tune: If True, allows transformers to be fine-tuned during training 46 """ ---> 47 TransformerEmbeddings.init( 48 self, 49 model=model, 50 layers=layers, 51 layer_mean=layer_mean, 52 is_token_embedding=is_token_embedding, 53 is_document_embedding=True, 54 **kwargs, 55 )

File ~\AppData\Roaming\Python\Python39\site-packages\flair\embeddings\transformer.py:966, in TransformerEmbeddings.init(self, model, fine_tune, layers, layer_mean, subtoken_pooling, cls_pooling, is_token_embedding, is_document_embedding, allow_long_sentences, use_context, respect_document_boundaries, context_dropout, saved_config, tokenizer_data, feature_extractor_data, name, force_max_length, needs_manual_ocr, use_context_separator, kwargs) 962 self.feature_extractor: Optional[FeatureExtractionMixin] 964 if tokenizer_data is None: 965 # load tokenizer and transformer model --> 966 self.tokenizer = AutoTokenizer.from_pretrained(model, add_prefix_space=True, kwargs) 967 try: 968 self.feature_extractor = AutoFeatureExtractor.from_pretrained(model, apply_ocr=False)

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py:642, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, *kwargs) 639 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, inputs, kwargs) 641 # Next, let's try to use the tokenizer_config file to get the tokenizer class. --> 642 tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, kwargs) 643 if "_commit_hash" in tokenizer_config: 644 kwargs["_commit_hash"] = tokenizer_config["_commit_hash"]

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py:486, in get_tokenizer_config(pretrained_model_name_or_path, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, **kwargs) 424 """ 425 Loads the tokenizer configuration from a pretrained model tokenizer configuration. 426 (...) 483 tokenizer_config = get_tokenizer_config("tokenizer-test") 484 ```""" 485 commit_hash = kwargs.get("_commit_hash", None) --> 486 resolved_config_file = cached_file( 487 pretrained_model_name_or_path, 488 TOKENIZER_CONFIG_FILE, 489 cache_dir=cache_dir, 490 force_download=force_download, 491 resume_download=resume_download, 492 proxies=proxies, 493 use_auth_token=use_auth_token, 494 revision=revision, 495 local_files_only=local_files_only, 496 subfolder=subfolder, 497 _raise_exceptions_for_missing_entries=False, 498 _raise_exceptions_for_connection_errors=False, 499 _commit_hash=commit_hash, 500 ) 501 if resolved_config_file is None: 502 logger.info("Could not locate the tokenizer configuration file, will try to use the model config instead.")

File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\transformers\utils\hub.py:424, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash) 409 resolved_file = hf_hub_download( 410 path_or_repo_id, 411 filename, (...) 420 local_files_only=local_files_only, 421 ) 423 except RepositoryNotFoundError: --> 424 raise EnvironmentError( 425 f"{path_or_repo_id} is not a local folder and is not a valid model identifier " 426 "listed on 'https://huggingface.co/models'\nIf this is a private repository, make sure to " 427 "pass a token having permission to this repo with use_auth_token or log in with " 428 "huggingface-cli login and pass use_auth_token=True." 429 ) 430 except RevisionNotFoundError: 431 raise EnvironmentError( 432 f"{revision} is not a valid git identifier (branch name, tag name or commit id) that exists " 433 "for this model name. Check the model page at " 434 f"'https://huggingface.co/{path_or_repo_id}' for available revisions." 435 )

OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.