Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
Traceback (most recent call last):
File "dp_bert_large_hf_pretrain_hdf5.py", line 625, in <module>
_mp_fn(0, args)
File "dp_bert_large_hf_pretrain_hdf5.py", line 584, in _mp_fn
train_bert_hdf5(flags)
File "dp_bert_large_hf_pretrain_hdf5.py", line 269, in train_bert_hdf5
model = get_model(flags)
File "dp_bert_large_hf_pretrain_hdf5.py", line 224, in get_model
base_model = BertForPreTraining.from_pretrained('bert-large-uncased')
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2301, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 402, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
RuntimeError: unable to open file </home/ubuntu/hf_cache/compute1-dy-kaena-training-0-1/hub/models--bert-large-uncased/snapshots/6da4b6a26a1877e173fca3225479512db81a5e5b/model.safetensors> in read-only mode: No such file or directory (2)
The work-around is to pin huggingface-hub version to 0.22:
When running BERT pretraining tutorial https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/bert.html#hf-bert-pretraining-tutorial you may see the following errors:
The work-around is to pin huggingface-hub version to 0.22: