I have a few questions. I would be grateful if you answer them.
Could you please tell me what the argument is and what it affects?
parser.add_argument('--smart_bob', action='store_true', default=False, help='make Bob smart again')
In Deal or No Deal? End-to-End Learning for Negotiation Dialogues in 6.1: During reinforcement learning, we use a learning rate of 0.1, clip gradients above 1.0, and use a discount factor of γ=0.95.
But in reinforce.py: parser.add_argument('--gamma', type=float, default=0.99, help='discount factor'). It matters to learning?
Also in reinforce.py we see: 'parser.add_argument('--clip', type=float, default=0.1, help='gradient clip') In report: clip gradients above 1.0
Reinforce learning rate and gradient clip. In script default value is:
parser.add_argument('--rl_lr', type=float, default=0.002, help='RL learning rate')
parser.add_argument('--rl_clip', type=float, default=2.0, help='RL gradient clip')
In code snippet in readme file: --rl_lr 0.00001 \ --rl_clip 0.0001 \ It matters to learning?
How long does it take to execute the script reinforce with arguments from snippet?
During training, a large number of dialogs appear in which one of the agents repeats one word a large number of times. It' ok?
Hello.
I have a few questions. I would be grateful if you answer them.
Could you please tell me what the argument is and what it affects? parser.add_argument('--smart_bob', action='store_true', default=False, help='make Bob smart again')
In Deal or No Deal? End-to-End Learning for Negotiation Dialogues in 6.1: During reinforcement learning, we use a learning rate of 0.1, clip gradients above 1.0, and use a discount factor of γ=0.95. But in reinforce.py: parser.add_argument('--gamma', type=float, default=0.99, help='discount factor'). It matters to learning?
Also in reinforce.py we see: 'parser.add_argument('--clip', type=float, default=0.1, help='gradient clip') In report: clip gradients above 1.0
Reinforce learning rate and gradient clip. In script default value is:
parser.add_argument('--rl_lr', type=float, default=0.002, help='RL learning rate') parser.add_argument('--rl_clip', type=float, default=2.0, help='RL gradient clip') In code snippet in readme file: --rl_lr 0.00001 \ --rl_clip 0.0001 \ It matters to learning?
How long does it take to execute the script reinforce with arguments from snippet?
During training, a large number of dialogs appear in which one of the agents repeats one word a large number of times. It' ok?