Issues with Consistency-based Reinforcement Learning and Ensemble Method

JeongEunhye00 commented 3 months ago

Hi, I'm very interested in your research and would like to run your code, but I've encountered a few issues.

While following the README, I attempted to perform Consistency-based Reinforcement Learning, but the training doesn't seem to be progressing correctly. Is there a part missing in the code?
I am also curious about the method you used for the ensemble.

thank you!

GasolSun36 commented 3 months ago

Hi, I'm very interested in your research and would like to run your code, but I've encountered a few issues.

While following the README, I attempted to perform Consistency-based Reinforcement Learning, but the training doesn't seem to be progressing correctly. Is there a part missing in the code?

I am also curious about the method you used for the ensemble.

thank you!

Hi, Thank you for your interest in our work!

What specific error was reported? Could you please show it to me?
We use "major vote" as ensemble method.

JeongEunhye00 commented 3 months ago

Traceback (most recent call last):
  File "/workspace/APOLLO/baseline/code/generator/Main.py", line 778, in <module>
    train(args)
  File "/workspace/APOLLO/baseline/code/generator/Main.py", line 288, in train
    this_logits, m_list = model(True,
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/APOLLO/baseline/code/generator/Model.py", line 285, in forward
    probs = self.softmax(option_logits)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1265, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Bert_model' object has no attribute 'softmax'

I encountered an error, so I added self.softmax = nn.Softmax(dim=-1) to the __init__ function. The code runs, but the training doesn't seem to be working properly. There are also too many "n/a" values appearing during the training iterations. I just don't know what the problem is..😥

GasolSun36 commented 3 months ago

Traceback (most recent call last):
  File "/workspace/APOLLO/baseline/code/generator/Main.py", line 778, in <module>
    train(args)
  File "/workspace/APOLLO/baseline/code/generator/Main.py", line 288, in train
    this_logits, m_list = model(True,
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/APOLLO/baseline/code/generator/Model.py", line 285, in forward
    probs = self.softmax(option_logits)
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1265, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Bert_model' object has no attribute 'softmax'

I encountered an error, so I added self.softmax = nn.Softmax(dim=-1) to the __init__ function. The code runs, but the training doesn't seem to be working properly. There are also too many "n/a" values appearing during the training iterations. I just don't know what the problem is..😥

Can you send me the sh script you run? You should warm-up a generator first (train like 80, 100 iterations), then read the generator and continue training with RL.

JeongEunhye00 commented 2 months ago

python -u -m torch.distributed.launch --nproc_per_node=2 --master_port=6899 ./baseline/code/generator/Main.py\ --root_path "./" \ --model_save_name generator-roberta-large \ --pretrained_model roberta \ --model_size roberta-large \ --mode train \ --features_dir ./baseline/dataset/generator/ \ --examples_dir ./baseline/dataset/generator/ \ --tags 3 \ --saved_model_path "./baseline/output/generator/30000_model.pt" \ --dataset_type finqa --rl \ --epoch 50 \ --batach_size 4 \ --gradient_accumulation_steps 4 \ --report 2000 \ --report_loss 500 \ I used it like this.

GasolSun36 / APOLLO

Issues with Consistency-based Reinforcement Learning and Ensemble Method #1