Closed shinkuan closed 1 year ago
The goal of model training is to minimize the value of the loss function, thus a lower q1 loss or q2 loss value indicates that the model's predictions are closer to the actual outcomes, and therefore, the model is performing better. I think your reward signal are improperly set, this could cause the model's behavior to deviate from what is expected, leading to an increase in q loss.
thus a lower q1 loss or q2 loss value indicates that the model's predictions are closer to the actual outcomes
What about V loss, what does V loss represent?
I think your reward signal are improperly set, this could cause the model's behavior to deviate from what is expected, leading to an increase in q loss.
The reward function I gave tend to give more reward when close to the win (shanten) of the game.
I don't know whether I should give reward base on the decision it made is good
or base on how close it is right now to winning the game.
I tried to lower optimizer's learning rate. And that seems to solve the problem:
v loss and q loss refer to the losses of the value function and the action-value function respectively. v loss reflects the accuracy of the model in evaluating state values it should be as low as possible
The loss keeps increasing. What config I may have set wrong?
Is the LR too high? I was using default value.
Ran the training using:
torchrun --nproc_per_node gpu --standalone -m kanachan.training.iql.train training_data=/workspace/data/annotate4rl_00000.txt num_workers=2 device=cuda encoder=bert_base decoder=double reward_plugin=/workspace/kanachan/kanachan/training/iql/get_reward.py discount_factor=1.0 target_update_rate=0.1 checkpointing=true batch_size=200 snapshot_interval=3000000 expectile=0.9
Reward function: (Pseudo Code)
9f8)
train.log