[Bug]: Challenges in running TUTORIAL_10_TRAINING_ZERO_SHOT_MODEL

None-Such commented 1 year ago

Describe the bug

TUTORIAL_10_TRAINING_ZERO_SHOT_MODEL at Use Case 3: Train a TARS model step # 8 raises following error:

OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

How should one add there HuggingFace Token? Or is this a misleading error?

To Reproduce

Use Flair v0.12.2

Open **TUTORIAL_10_TRAINING_ZERO_SHOT_MODEL**
[https://github.com/flairNLP/flair/blob/v0.12.2/resources/docs/TUTORIAL_10_TRAINING_ZERO_SHOT_MODEL.md]

Run **Use Case 3: Train a TARS model**

Step # 8. start the training

Training runs . . .

2023-08-27 21:26:31,228 EPOCH 1 done: loss 0.0921 - lr 0.020000
100%|██████████| 35/35 [00:38<00:00,  1.10s/it]
2023-08-27 21:27:09,773 Evaluating as a multi-label problem: True
2023-08-27 21:27:09,796 DEV : loss 0.035441480576992035 - f1-score (micro avg)  0.981
2023-08-27 21:27:09,809 BAD EPOCHS (no improvement): 0
2023-08-27 21:27:09,811 saving best model

but then **throws error**:

Expected behavior

Model should train & save without raising error

Logs and Stack traces

OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

Screenshots

No response

Additional Context

No response

Environment

Versions:

Flair

0.12.2

Pytorch

2.0.1

Transformers

4.29.2

GPU

True

helpmefindaname commented 1 year ago

Hi @None-Such sorry for a late response.

Given that the error states that it is looking for the model None, I don't think that this is an permission problem. Can you please share the full stack-trace, so we can analyze where this happens exactly?

None-Such commented 1 year ago

Hi @helpmefindaname ,

Full Stack Trace is below (after processing output that raised it)

2023-09-19 15:52:32,783 https://cogcomp.seas.upenn.edu/Data/QA/QC/train_5500.label not found in cache, downloading to C:\Users\ADMINI~1\AppData\Local\Temp\2\tmp_rxgzuzx 100%|██████████| 328k/328k [00:00<00:00, 1.08MB/s] 2023-09-19 15:52:33,344 copying C:\Users\ADMINI~1\AppData\Local\Temp\2\tmp_rxgzuzx to cache at C:\Users\Administrator.flair\datasets\trec_6\original\train_5500.label 2023-09-19 15:52:33,346 removing temp file C:\Users\ADMINI~1\AppData\Local\Temp\2\tmp_rxgzuzx

2023-09-19 15:52:33,595 https://cogcomp.seas.upenn.edu/Data/QA/QC/TREC_10.label not found in cache, downloading to C:\Users\ADMINI~1\AppData\Local\Temp\2\tmp89_v6xsk 100%|██████████| 22.8k/22.8k [00:00<00:00, 303kB/s] 2023-09-19 15:52:33,924 copying C:\Users\ADMINI~1\AppData\Local\Temp\2\tmp89_v6xsk to cache at C:\Users\Administrator.flair\datasets\trec_6\original\TREC_10.label 2023-09-19 15:52:33,925 removing temp file C:\Users\ADMINI~1\AppData\Local\Temp\2\tmp89_v6xsk 2023-09-19 15:52:33,945 Reading data from C:\Users\Administrator.flair\datasets\trec_6 2023-09-19 15:52:33,946 Train: C:\Users\Administrator.flair\datasets\trec_6\train.txt 2023-09-19 15:52:33,947 Dev: None 2023-09-19 15:52:33,947 Test: C:\Users\Administrator.flair\datasets\trec_6\test.txt

2023-09-19 15:52:34,708 Initialized corpus C:\Users\Administrator.flair\datasets\trec_6 (label type name is 'question_class') 2023-09-19 15:52:34,709 Computing label dictionary. Progress: 4907it [00:00, 30866.83it/s] 2023-09-19 15:52:34,871 Dictionary created for label 'question_class' with 7 values: question about entity (seen 1146 times), question about person (seen 1091 times), question about description (seen 1033 times), question about number (seen 815 times), question about location (seen 746 times), question about abbreviation (seen 76 times)

2023-09-19 15:52:37,650 TARS initialized without a task. You need to call .add_and_switch_to_new_task() before training this model 2023-09-19 15:52:37,870 ---------------------------------------------------------------------------------------------------- 2023-09-19 15:52:37,870 ---------------------------------------------------------------------------------------------------- 2023-09-19 15:52:37,870 Model: "TARSClassifier( (tars_model): TextClassifier( (embeddings): TransformerDocumentEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(30522, 768, padding_idx=0) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (decoder): Linear(in_features=768, out_features=2, bias=True) (dropout): Dropout(p=0.0, inplace=False) (locked_dropout): LockedDropout(p=0.0) (word_dropout): WordDropout(p=0.0) (loss_function): CrossEntropyLoss() ) )" 2023-09-19 15:52:37,887 ---------------------------------------------------------------------------------------------------- 2023-09-19 15:52:37,889 Corpus: "Corpus: 4907 train + 545 dev + 500 test sentences" 2023-09-19 15:52:37,889 ---------------------------------------------------------------------------------------------------- 2023-09-19 15:52:37,891 Parameters: 2023-09-19 15:52:37,892 - learning_rate: "0.020000" 2023-09-19 15:52:37,892 - mini_batch_size: "16" 2023-09-19 15:52:37,894 - patience: "3" 2023-09-19 15:52:37,895 - anneal_factor: "0.5" 2023-09-19 15:52:37,896 - max_epochs: "1" 2023-09-19 15:52:37,897 - shuffle: "True" 2023-09-19 15:52:37,898 - train_with_dev: "False" 2023-09-19 15:52:37,899 - batch_growth_annealing: "False" 2023-09-19 15:52:37,900 ---------------------------------------------------------------------------------------------------- 2023-09-19 15:52:37,901 Model training base path: "resources\taggers\trec" 2023-09-19 15:52:37,902 ---------------------------------------------------------------------------------------------------- 2023-09-19 15:52:37,903 Device: cpu 2023-09-19 15:52:37,904 ---------------------------------------------------------------------------------------------------- 2023-09-19 15:52:37,907 Embeddings storage mode: cpu 2023-09-19 15:52:37,909 ---------------------------------------------------------------------------------------------------- 2023-09-19 15:54:31,480 epoch 1 - iter 30/307 - loss 0.10430004 - time (sec): 113.57 - samples/sec: 12.68 - lr: 0.020000 2023-09-19 15:56:28,745 epoch 1 - iter 60/307 - loss 0.09754359 - time (sec): 230.84 - samples/sec: 12.48 - lr: 0.020000 2023-09-19 15:58:19,664 epoch 1 - iter 90/307 - loss 0.10188867 - time (sec): 341.75 - samples/sec: 12.64 - lr: 0.020000 2023-09-19 16:00:12,818 epoch 1 - iter 120/307 - loss 0.12133670 - time (sec): 454.91 - samples/sec: 12.66 - lr: 0.020000 2023-09-19 16:02:05,733 epoch 1 - iter 150/307 - loss 0.10956032 - time (sec): 567.82 - samples/sec: 12.68 - lr: 0.020000 2023-09-19 16:03:57,379 epoch 1 - iter 180/307 - loss 0.10636925 - time (sec): 679.47 - samples/sec: 12.72 - lr: 0.020000 2023-09-19 16:05:50,215 epoch 1 - iter 210/307 - loss 0.10998974 - time (sec): 792.31 - samples/sec: 12.72 - lr: 0.020000 2023-09-19 16:07:41,824 epoch 1 - iter 240/307 - loss 0.10729890 - time (sec): 903.91 - samples/sec: 12.74 - lr: 0.020000 2023-09-19 16:09:30,334 epoch 1 - iter 270/307 - loss 0.10637763 - time (sec): 1012.42 - samples/sec: 12.80 - lr: 0.020000 2023-09-19 16:11:24,736 epoch 1 - iter 300/307 - loss 0.10409902 - time (sec): 1126.83 - samples/sec: 12.78 - lr: 0.020000 2023-09-19 16:11:48,328 ---------------------------------------------------------------------------------------------------- 2023-09-19 16:11:48,328 EPOCH 1 done: loss 0.1028 - lr 0.020000 100%|██████████| 35/35 [03:40<00:00, 6.29s/it] 2023-09-19 16:15:28,472 Evaluating as a multi-label problem: True 2023-09-19 16:15:28,519 DEV : loss 0.0701458603143692 - f1-score (micro avg) 0.963 2023-09-19 16:15:28,542 BAD EPOCHS (no improvement): 0 2023-09-19 16:15:28,543 saving best model

2023-09-19 16:15:30,575 ----------------------------------------------------------------------------------------------------

HTTPError Traceback (most recent call last) File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\huggingface_hub\utils_errors.py:259, in hf_raise_for_status(response, endpoint_name) 258 try: --> 259 response.raise_for_status() 260 except HTTPError as e:

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\requests\models.py:1021, in Response.raise_for_status(self) 1020 if http_error_msg: -> 1021 raise HTTPError(http_error_msg, response=self)

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

RepositoryNotFoundError Traceback (most recent call last) File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\transformers\utils\hub.py:417, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash) 415 try: 416 # Load from URL or cache if already cached --> 417 resolved_file = hf_hub_download( 418 path_or_repo_id, 419 filename, 420 subfolder=None if len(subfolder) == 0 else subfolder, 421 repo_type=repo_type, 422 revision=revision, 423 cache_dir=cache_dir, 424 user_agent=user_agent, 425 force_download=force_download, 426 proxies=proxies, 427 resume_download=resume_download, 428 use_auth_token=use_auth_token, 429 local_files_only=local_files_only, 430 ) 432 except RepositoryNotFoundError:

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\huggingface_hub\utils_validators.py:120, in validate_hf_hub_args.._inner_fn(*args, *kwargs) 118 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name, has_token=has_token, kwargs=kwargs) --> 120 return fn(args, **kwargs)

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\huggingface_hub\file_download.py:1195, in hf_hub_download(repo_id, filename, subfolder, repo_type, revision, library_name, library_version, cache_dir, local_dir, local_dir_use_symlinks, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout) 1194 try: -> 1195 metadata = get_hf_file_metadata( 1196 url=url, 1197 token=token, 1198 proxies=proxies, 1199 timeout=etag_timeout, 1200 ) 1201 except EntryNotFoundError as http_error: 1202 # Cache the non-existence of the file and raise

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\huggingface_hub\utils_validators.py:120, in validate_hf_hub_args.._inner_fn(*args, *kwargs) 118 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name, has_token=has_token, kwargs=kwargs) --> 120 return fn(args, **kwargs)

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\huggingface_hub\file_download.py:1541, in get_hf_file_metadata(url, token, proxies, timeout) 1532 r = _request_wrapper( 1533 method="HEAD", 1534 url=url, (...) 1539 timeout=timeout, 1540 ) -> 1541 hf_raise_for_status(r) 1543 # Return

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\huggingface_hub\utils_errors.py:291, in hf_raise_for_status(response, endpoint_name) 283 message = ( 284 f"{response.status_code} Client Error." 285 + "\n\n" (...) 289 " make sure you are authenticated." 290 ) --> 291 raise RepositoryNotFoundError(message, response) from e 293 elif response.status_code == 400:

RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6509c923-47a65d870f7fa63e1302ab90)

Repository Not Found for url: https://huggingface.co/None/resolve/main/tokenizer_config.json. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated. Invalid username or password.

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last) Cell In[3], line 40 37 trainer = ModelTrainer(tars, corpus) 39 # 8. start the training ---> 40 trainer.train(base_path='resources/taggers/trec', # path to store the model artifacts 41 learning_rate=0.02, # use very small learning rate 42 mini_batch_size=16, 43 mini_batch_chunk_size=4, # optionally set this if transformer is too much for your machine 44 max_epochs=1, # terminate after 10 epochs 45 )

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\trainers\trainer.py:893, in ModelTrainer.train(self, base_path, learning_rate, mini_batch_size, eval_batch_size, mini_batch_chunk_size, max_epochs, train_with_dev, train_with_test, monitor_train, monitor_test, main_evaluation_metric, scheduler, anneal_factor, patience, min_learning_rate, initial_extra_patience, optimizer, cycle_momentum, warmup_fraction, embeddings_storage_mode, checkpoint, save_final_model, anneal_with_restarts, anneal_with_prestarts, anneal_against_dev_loss, batch_growth_annealing, shuffle, param_selection_mode, write_weights, num_workers, sampler, use_amp, amp_opt_level, eval_on_train_fraction, eval_on_train_shuffle, save_model_each_k_epochs, tensorboard_comment, use_swa, use_final_model_for_eval, gold_label_dictionary_for_eval, exclude_labels, create_file_logs, create_loss_file, epoch, use_tensorboard, tensorboard_log_dir, metrics_for_tensorboard, optimizer_state_dict, scheduler_state_dict, save_optimizer_state, reduce_transformer_vocab, shuffle_first_epoch, **kwargs) 891 # test best model if test data is present 892 if self.corpus.test and not train_with_test: --> 893 final_score = self.final_test( 894 base_path=base_path, 895 eval_mini_batch_size=eval_batch_size, 896 num_workers=num_workers, 897 main_evaluation_metric=main_evaluation_metric, 898 gold_label_dictionary_for_eval=gold_label_dictionary_for_eval, 899 exclude_labels=exclude_labels, 900 ) 901 else: 902 final_score = 0

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\trainers\trainer.py:1015, in ModelTrainer.final_test(self, base_path, eval_mini_batch_size, main_evaluation_metric, num_workers, gold_label_dictionary_for_eval, exclude_labels) 1012 self.model.eval() 1014 if (base_path / "best-model.pt").exists(): -> 1015 self.model.load_state_dict(self.model.load(base_path / "best-model.pt").state_dict()) 1016 else: 1017 log.info("Testing using last state of model ...")

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\models\tars_model.py:928, in TARSClassifier.load(cls, model_path) 924 @classmethod 925 def load(cls, model_path: Union[str, Path, Dict[str, Any]]) -> "TARSClassifier": 926 from typing import cast --> 928 return cast("TARSClassifier", super().load(model_path=model_path))

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\models\tars_model.py:323, in FewshotClassifier.load(cls, model_path) 319 @classmethod 320 def load(cls, model_path: Union[str, Path, Dict[str, Any]]) -> "FewshotClassifier": 321 from typing import cast --> 323 return cast("FewshotClassifier", super().load(model_path=model_path))

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\nn\model.py:559, in Classifier.load(cls, model_path) 555 @classmethod 556 def load(cls, model_path: Union[str, Path, Dict[str, Any]]) -> "Classifier": 557 from typing import cast --> 559 return cast("Classifier", super().load(model_path=model_path))

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\nn\model.py:191, in Model.load(cls, model_path) 189 if not isinstance(model_path, dict): 190 model_file = cls._fetch_model(str(model_path)) --> 191 state = load_torch_state(model_file) 192 else: 193 state = model_path

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\file_utils.py:359, in load_torch_state(model_file) 355 # load_big_file is a workaround byhttps://github.com/highway11git 356 # to load models on some Mac/Windows setups 357 # see https://github.com/zalandoresearch/flair/issues/351 358 f = load_big_file(model_file) --> 359 return torch.load(f, map_location="cpu")

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\torch\serialization.py:809, in load(f, map_location, pickle_module, weights_only, pickle_load_args) 807 except RuntimeError as e: 808 raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None --> 809 return _load(opened_zipfile, map_location, pickle_module, pickle_load_args) 810 if weights_only: 811 try:

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\torch\serialization.py:1172, in _load(zip_file, map_location, pickle_module, pickle_file, pickle_load_args) 1170 unpickler = UnpicklerWrapper(data_file, pickle_load_args) 1171 unpickler.persistent_load = persistent_load -> 1172 result = unpickler.load() 1174 torch._utils._validate_loaded_sparse_tensors() 1176 return result

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\embeddings\transformer.py:1162, in TransformerEmbeddings.setstate(self, state) 1159 config_class = CONFIG_MAPPING[model_type] 1160 config = config_class.from_dict(config_state_dict) -> 1162 embedding = self.create_from_state(saved_config=config, **state) 1164 # copy values from new embedding 1165 for key in embedding.dict.keys():

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\embeddings\document.py:61, in TransformerDocumentEmbeddings.create_from_state(cls, state) 57 @classmethod 58 def create_from_state(cls, state): 59 # this parameter is fixed 60 del state["is_document_embedding"] ---> 61 return cls(**state)

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\embeddings\document.py:47, in TransformerDocumentEmbeddings.init(self, model, layers, layer_mean, is_token_embedding, kwargs) 30 def init( 31 self, 32 model: str = "bert-base-uncased", # set parameters with different default values (...) 36 kwargs, 37 ): 38 """ 39 Bidirectional transformer embeddings of words from various transformer architectures. 40 :param model: name of transformer model (see https://huggingface.co/transformers/pretrained_models.html for (...) 45 :param fine_tune: If True, allows transformers to be fine-tuned during training 46 """ ---> 47 TransformerEmbeddings.init( 48 self, 49 model=model, 50 layers=layers, 51 layer_mean=layer_mean, 52 is_token_embedding=is_token_embedding, 53 is_document_embedding=True, 54 **kwargs, 55 )

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\embeddings\transformer.py:966, in TransformerEmbeddings.init(self, model, fine_tune, layers, layer_mean, subtoken_pooling, cls_pooling, is_token_embedding, is_document_embedding, allow_long_sentences, use_context, respect_document_boundaries, context_dropout, saved_config, tokenizer_data, feature_extractor_data, name, force_max_length, needs_manual_ocr, use_context_separator, kwargs) 962 self.feature_extractor: Optional[FeatureExtractionMixin] 964 if tokenizer_data is None: 965 # load tokenizer and transformer model --> 966 self.tokenizer = AutoTokenizer.from_pretrained(model, add_prefix_space=True, kwargs) 967 try: 968 self.feature_extractor = AutoFeatureExtractor.from_pretrained(model, apply_ocr=False)

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\transformers\models\auto\tokenization_auto.py:643, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, *kwargs) 640 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, inputs, kwargs) 642 # Next, let's try to use the tokenizer_config file to get the tokenizer class. --> 643 tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, kwargs) 644 if "_commit_hash" in tokenizer_config: 645 kwargs["_commit_hash"] = tokenizer_config["_commit_hash"]

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\transformers\models\auto\tokenization_auto.py:487, in get_tokenizer_config(pretrained_model_name_or_path, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, **kwargs) 425 """ 426 Loads the tokenizer configuration from a pretrained model tokenizer configuration. 427 (...) 484 tokenizer_config = get_tokenizer_config("tokenizer-test") 485 ```""" 486 commit_hash = kwargs.get("_commit_hash", None) --> 487 resolved_config_file = cached_file( 488 pretrained_model_name_or_path, 489 TOKENIZER_CONFIG_FILE, 490 cache_dir=cache_dir, 491 force_download=force_download, 492 resume_download=resume_download, 493 proxies=proxies, 494 use_auth_token=use_auth_token, 495 revision=revision, 496 local_files_only=local_files_only, 497 subfolder=subfolder, 498 _raise_exceptions_for_missing_entries=False, 499 _raise_exceptions_for_connection_errors=False, 500 _commit_hash=commit_hash, 501 ) 502 if resolved_config_file is None: 503 logger.info("Could not locate the tokenizer configuration file, will try to use the model config instead.")

File C:\ProgramData\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\transformers\utils\hub.py:433, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash) 417 resolved_file = hf_hub_download( 418 path_or_repo_id, 419 filename, (...) 429 local_files_only=local_files_only, 430 ) 432 except RepositoryNotFoundError: --> 433 raise EnvironmentError( 434 f"{path_or_repo_id} is not a local folder and is not a valid model identifier " 435 "listed on 'https://huggingface.co/models'\nIf this is a private repository, make sure to " 436 "pass a token having permission to this repo with use_auth_token or log in with " 437 "huggingface-cli login and pass use_auth_token=True." 438 ) 439 except RevisionNotFoundError: 440 raise EnvironmentError( 441 f"{revision} is not a valid git identifier (branch name, tag name or commit id) that exists " 442 "for this model name. Check the model page at " 443 f"'https://huggingface.co/{path_or_repo_id}' for available revisions." 444 )

OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

None-Such commented 1 year ago

@helpmefindaname

Thanks for taking a look at this =)

Please let me know if you need anything else,

Regards,

Christopher

helpmefindaname commented 1 year ago

Hi @None-Such

I'm sorry for the late response. Your issue is related to https://github.com/flairNLP/flair/issues/3167 It should be fixable by running

model.tars_embeddings.model.config._name_or_path = "bert-base-uncased"
model.tars_embeddings.base_model_name = "bert-base-uncased"
model.tars_embeddings.name = "transformer-bert-base-uncased"

after loading the model

None-Such commented 1 year ago

@helpmefindaname

Thanks so much for your reply. That indeed got me past the previous error. =)

But unfortunately, I am now stuck on a different error in the next step of the same Tutorial:

How to train with multiple datasets . . .

# 6. start the training
trainer.train(base_path='resources/taggers/go_emotions', # path to store the model artifacts
              learning_rate=0.02, # use very small learning rate
              mini_batch_size=16,
              mini_batch_chunk_size=4, # optionally set this if transformer is too much for your machine
              max_epochs=10, # terminate after 10 epochs
              )

. . .

2023-10-03 13:35:58,830 epoch 2 - iter 2168/2714 - loss 0.17908706 - time (sec): 424.66 - samples/sec: 288.34 - lr: 0.020000
2023-10-03 13:36:52,011 epoch 2 - iter 2439/2714 - loss 0.17956718 - time (sec): 477.84 - samples/sec: 288.34 - lr: 0.020000
2023-10-03 13:37:46,191 epoch 2 - iter 2710/2714 - loss 0.17983137 - time (sec): 532.02 - samples/sec: 287.85 - lr: 0.020000
2023-10-03 13:37:46,815 ----------------------------------------------------------------------------------------------------
2023-10-03 13:37:46,816 EPOCH 2 done: loss 0.1798 - lr 0.020000
100%|██████████| 340/340 [31:49<00:00,  5.62s/it]
2023-10-03 14:09:36,627 Evaluating as a multi-label problem: True

2023-10-03 14:09:37,190 DEV : loss 0.21114441752433777 - f1-score (micro avg)  0.9731
2023-10-03 14:09:38,479 BAD EPOCHS (no improvement): 0
2023-10-03 14:09:38,480 saving best model
2023-10-03 14:09:40,984 ----------------------------------------------------------------------------------------------------

. . .

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[6], line 26
     23 trainer = ModelTrainer(tars, new_corpus)
     25 # 6. start the training
---> 26 trainer.train(base_path='resources/taggers/go_emotions', # path to store the model artifacts
     27               learning_rate=0.02, # use very small learning rate
     28               mini_batch_size=16,
     29               mini_batch_chunk_size=4, # optionally set this if transformer is too much for your machine
     30               max_epochs=10, # terminate after 10 epochs
     31               )

File ~\miniconda3\envs\flair-v12.2-env-v3\lib\site-packages\flair\trainers\trainer.py:557, in ModelTrainer.train(self, base_path, learning_rate, mini_batch_size, eval_batch_size, mini_batch_chunk_size, max_epochs, train_with_dev, train_with_test, monitor_train, monitor_test, main_evaluation_metric, scheduler, anneal_factor, patience, min_learning_rate, initial_extra_patience, optimizer, cycle_momentum, warmup_fraction, embeddings_storage_mode, checkpoint, save_final_model, anneal_with_restarts, anneal_with_prestarts, anneal_against_dev_loss, batch_growth_annealing, shuffle, param_selection_mode, write_weights, num_workers, sampler, use_amp, amp_opt_level, eval_on_train_fraction, eval_on_train_shuffle, save_model_each_k_epochs, tensorboard_comment, use_swa, use_final_model_for_eval, gold_label_dictionary_for_eval, exclude_labels, create_file_logs, create_loss_file, epoch, use_tensorboard, tensorboard_log_dir, metrics_for_tensorboard, optimizer_state_dict, scheduler_state_dict, save_optimizer_state, reduce_transformer_vocab, shuffle_first_epoch, **kwargs)
    555 else:
    556     loss.backward()
--> 557 train_loss += loss.item()
    559 # identify dynamic embeddings (always deleted) on first sentence
    561 if dynamic_embeddings is None:

RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

helpmefindaname commented 1 year ago

@None-Such as the error message says: For debugging consider passing CUDA_LAUNCH_BLOCKING=1 please rerun it with that environment variable set to get the real error

None-Such commented 1 year ago

Hi @helpmefindaname ,

Thanks for your patience with my superior ability to ignore the obvious ;)

Adding the environment variable has the effect of letting the step in question run all the way through . . . without visible error !

So now I am able to run the entire Tutorial =)

Running with a slightly modified environment

However, in the interest of full disclosure, I was running with a slightly modified environment. The current default dependencies where not working for me at all. I found I had to regress to dependencies as listed below.

Works

conda install -c conda-forge transformers=4.29.2
conda install -c conda-forge huggingface_hub=0.14.1

Does not Work

transformers 4.32.0
huggingface-hub 0.16.4

I will take another pass at trying to get the defaults to work and report back.

None-Such commented 1 year ago

@helpmefindaname

Upon re-visiting the environment definition, I can not get the tutorial to work in anything newer than:

transformers==4.30.2

Otherwise, I am good to go.

Thanks again your help.

ps. I updated the Issue Title so it may be more likely to be of use to others

flairNLP / flair