Import error after executing B-CL script

mistel1225 commented 3 years ago

Hi, I encounter Import error for executing the following script: ./commands/til_classification/asc/run_train_bert_adapter_capsule_mask_ncl.sh and the error message is as follows:

Inits... Traceback (most recent call last): File "run.py", line 81, in net=import_modules.network.Net(taskcla,args=args) File "/tmp2/mistel/test/PyContinual/src/networks/classification/bert_adapter_capsule_mask.py", line 19, in init self.bert = MyBertModel.from_pretrained(args.bert_model,config=config,args=args) File "/home/mistel/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1385, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/tmp2/mistel/test/PyContinual/src/./networks/base/my_transformers.py", line 664, in init self.encoder = MyBertEncoder(config,args) File "/tmp2/mistel/test/PyContinual/src/./networks/base/mytransformers.py", line 521, in init self.layer = nn.ModuleList([MyBertLayer(config,args) for in range(config.num_hidden_layers)]) File "/tmp2/mistel/test/PyContinual/src/./networks/base/mytransformers.py", line 521, in self.layer = nn.ModuleList([MyBertLayer(config,args) for in range(config.num_hidden_layers)]) File "/tmp2/mistel/test/PyContinual/src/./networks/base/my_transformers.py", line 363, in init self.attention = MyBertAttention(config,args) File "/tmp2/mistel/test/PyContinual/src/./networks/base/my_transformers.py", line 142, in init self.output = MyBertSelfOutput(config,args) File "/tmp2/mistel/test/PyContinual/src/./networks/base/my_transformers.py", line 86, in init from networks.classification.adapters import BertAdapterCapsuleMask ModuleNotFoundError: No module named 'networks.classification.adapters'

after I edit the import path from from networks.classification.adapters import BertAdapterCapsuleMask to from networks.base.adapters import BertAdapterCapsuleMask in my_transformers.py the following message occurred:

Inits... apply to attention BertAdapter BertAdapterMask apply_one_layer_shared CapsuleLayer Traceback (most recent call last): File "run.py", line 81, in net=import_modules.network.Net(taskcla,args=args) File "/tmp2/mistel/test/PyContinual/src/networks/classification/bert_adapter_capsule_mask.py", line 19, in init self.bert = MyBertModel.from_pretrained(args.bert_model,config=config,args=args) File "/home/mistel/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1385, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/tmp2/mistel/test/PyContinual/src/./networks/base/my_transformers.py", line 664, in init self.encoder = MyBertEncoder(config,args) File "/tmp2/mistel/test/PyContinual/src/./networks/base/mytransformers.py", line 521, in init self.layer = nn.ModuleList([MyBertLayer(config,args) for in range(config.num_hidden_layers)]) File "/tmp2/mistel/test/PyContinual/src/./networks/base/mytransformers.py", line 521, in self.layer = nn.ModuleList([MyBertLayer(config,args) for in range(config.num_hidden_layers)]) File "/tmp2/mistel/test/PyContinual/src/./networks/base/my_transformers.py", line 101, in init self.adapter_capsule_mask = BertAdapterCapsuleMask(args) File "/tmp2/mistel/test/PyContinual/src/networks/base/adapters.py", line 628, in init self.capsule_net = CapsNet(config) File "/tmp2/mistel/test/PyContinual/src/networks/base/adapters.py", line 667, in init self.tsv_capsules = CapsuleLayer(config,'tsv') File "/tmp2/mistel/test/PyContinual/src/networks/base/adapters.py", line 698, in init self.tsv = torch.tril(torch.ones(config.ntasks,config.ntasks)).data.cuda()# for backward RuntimeError: CUDA error: out of memoryl/src/./networks/base/my_transformers.py", line 101, in init self.adapter_capsule_mask = BertAdapterCapsuleMask(args) File "/tmp2/mistel/test/PyContinual/src/networks/base/adapters.py", line 628, in init self.capsule_net = CapsNet(config) File "/tmp2/mistel/test/PyContinual/src/networks/base/adapters.py", line 667, in init self.tsv_capsules = CapsuleLayer(config,'tsv') File "/tmp2/mistel/test/PyContinual/src/networks/base/adapters.py", line 698, in init self.tsv = torch.tril(torch.ones(config.ntasks,config.ntasks)).data.cuda()# for backward RuntimeError: CUDA error: out of memory

it seems like the second error message is I change the CUDA_VISIBLE_DEVICES=1 to CUDA_VISIBLE_DEVICES=0 in ./commands/til_classification/asc/run_train_bert_adapter_capsule_mask_ncl.sh however, in my enviroment, cuda:0 is RTX3090 and cuda:1 is RTX1080ti, it's not make sense, and I wandering if it is error from adapters.py.

thanks for your patience, best regards.

ZixuanKe commented 3 years ago

Hi,

The import path is not correct, as you pointed out. I updated the code accordly, thank you
It says it is OOM error. How large is your GPU memory? B-CL has capsule net which needs a large memeory. If you use the setting in the example script, it takes around 13G. If your GPU memoery is small, you may want to try decrese the batch size or do parallel training.

Thank you so much for your interest!

mistel1225 commented 3 years ago

Hi,

The import path is not correct, as you pointed out. I updated the code accordly, thank you

It says it is OOM error. How large is your GPU memory? B-CL has capsule net which needs a large memeory. If you use the setting in the example script, it task around 13G. If your GPU memoery is small, you may want to try decrese the batch size or do parallel training.

Thank you so much for your interest!

I change the CUDA_VISIBLE_DEVICES=1 to CUDA_VISIBLE_DEVICES=0 in ./commands/til_classification/asc/run_train_bert_adapter_capsule_mask_ncl.sh In my environment, cuda:0 represent RTX3090, which has 24GB VRAM, and cuda:1 represent RTX1080ti, which has 11GB vram, it's weird because I can run it on 1080ti instead of 3090, my script is as follows:

  1 #!/bin/bash
  2 
  3 if [ ! -d "OutputBert" ]; then
  4   mkdir OutputBert
  5 fi
  6 
  7 for id in 0 1 3
  8 do
  9     CUDA_VISIBLE_DEVICES=0 python  run.py \
 10     --bert_model 'bert-base-uncased' \
 11     --note random$id\
 12     --ntasks 19 \
 13     --task asc \
 14     --idrandom $id \
 15     --output_dir './OutputBert' \
 16     --scenario til_classification \
 17     --approach bert_adapter_capsule_mask_ncl \
 18     --experiment bert \
 19     --eval_batch_size 32 \
 20     --train_batch_size 16 \
 21     --num_train_epochs 10 \
 22     --apply_bert_output \
 23     --apply_bert_attention_output \
 24     --build_adapter_capsule_mask  \
 25     --apply_one_layer_shared \
 26     --xusemeval_num_train_epochs 10 \
 27     --bingdomains_num_train_epochs 30 \
 28     --bingdomains_num_train_epochs_multiplier 3 \
 29     --semantic_cap_size 3   1 #!/bin/bash
  2 
  3 if [ ! -d "OutputBert" ]; then
  4   mkdir OutputBert
  5 fi
  6 
  7 for id in 0 1 3
  8 do
  9     CUDA_VISIBLE_DEVICES=0 python  run.py \
 10     --bert_model 'bert-base-uncased' \
 11     --note random$id\
 12     --ntasks 19 \
 13     --task asc \
 14     --idrandom $id \
 15     --output_dir './OutputBert' \
 16     --scenario til_classification \
 17     --approach bert_adapter_capsule_mask_ncl \
 18     --experiment bert \
 19     --eval_batch_size 32 \
 20     --train_batch_size 16 \
 21     --num_train_epochs 10 \
 22     --apply_bert_output \
 23     --apply_bert_attention_output \
 24     --build_adapter_capsule_mask  \
 25     --apply_one_layer_shared \
 26     --xusemeval_num_train_epochs 10 \
 27     --bingdomains_num_train_epochs 30 \
 28     --bingdomains_num_train_epochs_multiplier 3 \
 29     --semantic_cap_size 3
 30 done
 31 
 32 #TODO: check other number of capsules
 33 #    --apply_one_layer_shared for 1,0,3
 34 #    --apply_two_layer_shared for 2,4
 30 done
 31 
 32 #TODO: check other number of capsules
 33 #    --apply_one_layer_shared for 1,0,3
 34 #    --apply_two_layer_shared for 2,4

I currently train it on 1080ti with 4 batch, what did you think of the performance difference between 16 batch and 4 batch?

mistel1225 commented 3 years ago

sorry, I found that the OOM is our problem because the device is broken, so it's not work, this issue can be closed!

thanks so much!

ZixuanKe / PyContinual

Import error after executing B-CL script #2