csebuetnlp / banglabert

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.
232 stars 31 forks source link

Error while finetuning in google colab using GPU #3

Closed fahshed closed 2 years ago

fahshed commented 2 years ago

Hi,

I want to finetune BanglaBERT for sequence classification.

This error occurred while running the following command (the example of sequence classificaton from github):

python ./sequence_classification/sequence_classification.py --overwrite_output_dir --model_name_or_path "csebuetnlp/banglabert" --dataset_dir "./sequence_classification/sample_inputs/single_sequence/jsonl" --output_dir "./sequence_classification/outputs/" --learning_rate=2e-5 --warmup_ratio 0.1 --gradient_accumulation_steps 2 --weight_decay 0.1 --lr_scheduler_type "linear" --per_device_train_batch_size=16 --per_device_eval_batch_size=16 --max_seq_length 512 --logging_strategy "epoch" --save_strategy "epoch" --evaluation_strategy "epoch" --num_train_epochs=3 --do_train --do_eval

Error Traceback:

05/08/2022 09:24:21 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
05/08/2022 09:24:21 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=2,
greater_is_better=None,
group_by_length=False,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=./sequence_classification/outputs/runs/May08_09-24-21_0da7ed02e26d,
logging_first_step=False,
logging_steps=500,
logging_strategy=IntervalStrategy.EPOCH,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
output_dir=./sequence_classification/outputs/,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=16,
per_device_train_batch_size=16,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=outputs,
push_to_hub_organization=None,
push_to_hub_token=None,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./sequence_classification/outputs/,
save_on_each_node=False,
save_steps=500,
save_strategy=IntervalStrategy.EPOCH,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_legacy_prediction_loop=False,
warmup_ratio=0.1,
warmup_steps=0,
weight_decay=0.1,
)
05/08/2022 09:24:21 - WARNING - datasets.builder - Using custom data configuration default-1e09c73b0f004fd6
05/08/2022 09:24:21 - INFO - datasets.builder - Overwrite dataset info from restored data version.
05/08/2022 09:24:21 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0
05/08/2022 09:24:21 - WARNING - datasets.builder - Reusing dataset json (/root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0)
05/08/2022 09:24:21 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0
100% 3/3 [00:00<00:00, 886.31it/s]
[INFO|configuration_utils.py:561] 2022-05-08 09:24:22,163 >> loading configuration file https://huggingface.co/csebuetnlp/banglabert/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/60928dc4b87f5881692890e6541e6538f91588d2ea40cbbbdc04cfb2cb83a6b1.2388211ba94f448fcf40aef3c9526142a8c2f2a8fb4fce8a3801462f51b2bab5
[INFO|configuration_utils.py:598] 2022-05-08 09:24:22,164 >> Model config ElectraConfig {
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "embedding_size": 768,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "electra",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "summary_activation": "gelu",
  "summary_last_dropout": 0.1,
  "summary_type": "first",
  "summary_use_proj": true,
  "transformers_version": "4.11.0.dev0",
  "type_vocab_size": 2,
  "vocab_size": 32000
}

[INFO|configuration_utils.py:561] 2022-05-08 09:24:23,954 >> loading configuration file https://huggingface.co/csebuetnlp/banglabert/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/60928dc4b87f5881692890e6541e6538f91588d2ea40cbbbdc04cfb2cb83a6b1.2388211ba94f448fcf40aef3c9526142a8c2f2a8fb4fce8a3801462f51b2bab5
[INFO|configuration_utils.py:598] 2022-05-08 09:24:23,955 >> Model config ElectraConfig {
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "embedding_size": 768,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "electra",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "summary_activation": "gelu",
  "summary_last_dropout": 0.1,
  "summary_type": "first",
  "summary_use_proj": true,
  "transformers_version": "4.11.0.dev0",
  "type_vocab_size": 2,
  "vocab_size": 32000
}

[INFO|tokenization_utils_base.py:1739] 2022-05-08 09:24:29,230 >> loading file https://huggingface.co/csebuetnlp/banglabert/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/65e95b847336b6bf69b37fdb8682a97e822799adcd9745dcf9bf44cfe4db1b9a.8f92ca2cf7e2eaa550b10c40331ae9bf0f2e40abe3b549f66a3d7f13bfc6de47
[INFO|tokenization_utils_base.py:1739] 2022-05-08 09:24:29,230 >> loading file https://huggingface.co/csebuetnlp/banglabert/resolve/main/added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1739] 2022-05-08 09:24:29,230 >> loading file https://huggingface.co/csebuetnlp/banglabert/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/7820dfc553e8dfb8a1e82042b7d0d691c7a7cd1e30ed2974218f696e81c5f3b1.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d
[INFO|tokenization_utils_base.py:1739] 2022-05-08 09:24:29,230 >> loading file https://huggingface.co/csebuetnlp/banglabert/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/76fa87a0ec9c34c9b15732bf7e06bced447feff46287b8e7d246a55d301784d7.b4f59cefeba4296760d2cf1037142788b96f2be40230bf6393d2fba714562485
[INFO|tokenization_utils_base.py:1739] 2022-05-08 09:24:29,230 >> loading file https://huggingface.co/csebuetnlp/banglabert/resolve/main/tokenizer.json from cache at None
[INFO|configuration_utils.py:561] 2022-05-08 09:24:30,126 >> loading configuration file https://huggingface.co/csebuetnlp/banglabert/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/60928dc4b87f5881692890e6541e6538f91588d2ea40cbbbdc04cfb2cb83a6b1.2388211ba94f448fcf40aef3c9526142a8c2f2a8fb4fce8a3801462f51b2bab5
[INFO|configuration_utils.py:598] 2022-05-08 09:24:30,126 >> Model config ElectraConfig {
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "embedding_size": 768,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "electra",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "summary_activation": "gelu",
  "summary_last_dropout": 0.1,
  "summary_type": "first",
  "summary_use_proj": true,
  "transformers_version": "4.11.0.dev0",
  "type_vocab_size": 2,
  "vocab_size": 32000
}

[INFO|modeling_utils.py:1279] 2022-05-08 09:24:31,075 >> loading weights file https://huggingface.co/csebuetnlp/banglabert/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/913ea71768a80ccdde3a9ab9a88cf2a93f37a52008896997655d0f63b0d0743a.8aaedac281b72dbb5296319c53be5a4e4a52339eded3f68d49201e140e221615
[WARNING|modeling_utils.py:1516] 2022-05-08 09:24:32,600 >> Some weights of the model checkpoint at csebuetnlp/banglabert were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense.weight', 'discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense_prediction.bias']
- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:1527] 2022-05-08 09:24:32,600 >> Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at csebuetnlp/banglabert and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
05/08/2022 09:24:32 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-c8c752bb15628b86.arrow
05/08/2022 09:24:32 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-1d7e8a13339dd538.arrow
05/08/2022 09:24:32 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-5b734993f8fa5b18.arrow
05/08/2022 09:24:33 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-ae957e77cc0e01d1.arrow
05/08/2022 09:24:33 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-ad37b78f61cc4fc6.arrow
05/08/2022 09:24:33 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-efbe758578e42091.arrow
05/08/2022 09:24:33 - INFO - __main__ - Sample 0 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'input_ids': [2, 4992, 10267, 784, 27147, 415, 830, 7761, 1333, 16, 983, 12484, 825, 5083, 2893, 426, 2636, 16493, 415, 815, 2068, 795, 205, 3], 'label': 0, 'sentence1': 'যেই মাদারির পোলারা এই কাজটি করেছে, সেই সালারা অবৈধ জারপ সন্তান ছারা আর কিছুই না।', 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
05/08/2022 09:24:33 - INFO - __main__ - Sample 3 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1], 'input_ids': [2, 10634, 5452, 817, 972, 6037, 3], 'label': 0, 'sentence1': 'মুসা কপা\u200cলে কি আ\u200cছে জা\u200cনিনা', 'token_type_ids': [0, 0, 0, 0, 0, 0, 0]}.
05/08/2022 09:24:33 - INFO - __main__ - Sample 1 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1], 'input_ids': [2, 2157, 18812, 16332, 12062, 16135, 1292, 3], 'label': 0, 'sentence1': 'ভারতের কুখ্যাত ষড়যন্ত্রের মুখোশ উন্মোচন হলো', 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0]}.
05/08/2022 09:24:35 - INFO - datasets.load - Found main folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/accuracy/accuracy.py at /root/.cache/huggingface/modules/datasets_modules/metrics/accuracy
05/08/2022 09:24:35 - INFO - datasets.load - Found specific version folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/accuracy/accuracy.py at /root/.cache/huggingface/modules/datasets_modules/metrics/accuracy/6dba4616f6b2bbd19659d1db3a48cc001c8f13a10cdc73a2641a55f7a60b7b5b
05/08/2022 09:24:35 - INFO - datasets.load - Found script file from https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/accuracy/accuracy.py to /root/.cache/huggingface/modules/datasets_modules/metrics/accuracy/6dba4616f6b2bbd19659d1db3a48cc001c8f13a10cdc73a2641a55f7a60b7b5b/accuracy.py
05/08/2022 09:24:35 - INFO - datasets.load - Couldn't find dataset infos file at https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/accuracy/dataset_infos.json
05/08/2022 09:24:35 - INFO - datasets.load - Found metadata file for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/accuracy/accuracy.py at /root/.cache/huggingface/modules/datasets_modules/metrics/accuracy/6dba4616f6b2bbd19659d1db3a48cc001c8f13a10cdc73a2641a55f7a60b7b5b/accuracy.json
05/08/2022 09:24:36 - INFO - datasets.load - Found main folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/precision/precision.py at /root/.cache/huggingface/modules/datasets_modules/metrics/precision
05/08/2022 09:24:36 - INFO - datasets.load - Found specific version folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/precision/precision.py at /root/.cache/huggingface/modules/datasets_modules/metrics/precision/94709a71c6fe37171ef49d3466fec24dee9a79846c9f176dff66a649e9811690
05/08/2022 09:24:36 - INFO - datasets.load - Found script file from https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/precision/precision.py to /root/.cache/huggingface/modules/datasets_modules/metrics/precision/94709a71c6fe37171ef49d3466fec24dee9a79846c9f176dff66a649e9811690/precision.py
05/08/2022 09:24:36 - INFO - datasets.load - Couldn't find dataset infos file at https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/precision/dataset_infos.json
05/08/2022 09:24:36 - INFO - datasets.load - Found metadata file for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/precision/precision.py at /root/.cache/huggingface/modules/datasets_modules/metrics/precision/94709a71c6fe37171ef49d3466fec24dee9a79846c9f176dff66a649e9811690/precision.json
05/08/2022 09:24:38 - INFO - datasets.load - Found main folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/recall/recall.py at /root/.cache/huggingface/modules/datasets_modules/metrics/recall
05/08/2022 09:24:38 - INFO - datasets.load - Found specific version folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/recall/recall.py at /root/.cache/huggingface/modules/datasets_modules/metrics/recall/1e3b93e2bed42e1498e628f161d79ee019dd8e78d36985d3c7ecbc018adf35e8
05/08/2022 09:24:38 - INFO - datasets.load - Found script file from https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/recall/recall.py to /root/.cache/huggingface/modules/datasets_modules/metrics/recall/1e3b93e2bed42e1498e628f161d79ee019dd8e78d36985d3c7ecbc018adf35e8/recall.py
05/08/2022 09:24:38 - INFO - datasets.load - Couldn't find dataset infos file at https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/recall/dataset_infos.json
05/08/2022 09:24:38 - INFO - datasets.load - Found metadata file for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/recall/recall.py at /root/.cache/huggingface/modules/datasets_modules/metrics/recall/1e3b93e2bed42e1498e628f161d79ee019dd8e78d36985d3c7ecbc018adf35e8/recall.json
05/08/2022 09:24:39 - INFO - datasets.load - Found main folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/f1/f1.py at /root/.cache/huggingface/modules/datasets_modules/metrics/f1
05/08/2022 09:24:39 - INFO - datasets.load - Found specific version folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/f1/f1.py at /root/.cache/huggingface/modules/datasets_modules/metrics/f1/6c86fddbf90432b9c43a8c38c62a0dd9de63bad2ef0a896f9ae20273d6d6f6e9
05/08/2022 09:24:39 - INFO - datasets.load - Found script file from https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/f1/f1.py to /root/.cache/huggingface/modules/datasets_modules/metrics/f1/6c86fddbf90432b9c43a8c38c62a0dd9de63bad2ef0a896f9ae20273d6d6f6e9/f1.py
05/08/2022 09:24:39 - INFO - datasets.load - Couldn't find dataset infos file at https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/f1/dataset_infos.json
05/08/2022 09:24:39 - INFO - datasets.load - Found metadata file for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/f1/f1.py at /root/.cache/huggingface/modules/datasets_modules/metrics/f1/6c86fddbf90432b9c43a8c38c62a0dd9de63bad2ef0a896f9ae20273d6d6f6e9/f1.json
[INFO|trainer.py:521] 2022-05-08 09:24:43,888 >> The following columns in the training set  don't have a corresponding argument in `ElectraForSequenceClassification.forward` and have been ignored: sentence1.
[INFO|trainer.py:1168] 2022-05-08 09:24:43,900 >> ***** Running training *****
[INFO|trainer.py:1169] 2022-05-08 09:24:43,900 >>   Num examples = 4
[INFO|trainer.py:1170] 2022-05-08 09:24:43,900 >>   Num Epochs = 3
[INFO|trainer.py:1171] 2022-05-08 09:24:43,900 >>   Instantaneous batch size per device = 16
[INFO|trainer.py:1172] 2022-05-08 09:24:43,900 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1173] 2022-05-08 09:24:43,900 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:1174] 2022-05-08 09:24:43,900 >>   Total optimization steps = 3
  0% 0/3 [00:00<?, ?it/s]Traceback (most recent call last):
  File "./sequence_classification/sequence_classification.py", line 479, in <module>
    main()
  File "./sequence_classification/sequence_classification.py", line 426, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1284, in train
    tr_loss += self.training_step(model, inputs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1789, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1821, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/electra/modeling_electra.py", line 973, in forward
    return_dict,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/electra/modeling_electra.py", line 879, in forward
    input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/electra/modeling_electra.py", line 206, in forward
    inputs_embeds = self.word_embeddings(input_ids)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 160, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2183, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
  0% 0/3 [00:00<?, ?it/s]

Probable solution from pytorch discussion forum which I can't figure out: https://discuss.pytorch.org/t/code-that-loads-sgd-fails-to-load-adam-state-to-gpu/61783/3?u=shaibagon

Thanks.

abhik1505040 commented 2 years ago

Hi, this seems to be an issue related to how you are activating the GPU session in Google colab. The error is irreproducible on my end (Reference Colab notebook). Try doing the following:

This should resolve the issue.

fahshed commented 2 years ago

Thanks bhai. The solution worked.