aleksa-sukovic / iclr2024-reward-design-for-justifiable-rl

Code for the paper "Reward Design for Justifiable Sequential Decision-Making"; ICLR 2024
https://openreview.net/forum?id=OUkZXbbwQr
MIT License
3 stars 0 forks source link

Issues Training Clinician Model – Failing to Learn #2

Open wenwxt opened 3 weeks ago

wenwxt commented 3 weeks ago

I’m trying to train the clinician model based on the provided code ,but the model seems to struggle with learning — the performance is significantly below the reported results, and it appears that the model is not improving during training.

aleksa-sukovic commented 2 weeks ago

Without having access to the exact data underlying the training, I can only give some high-level suggestions. First, when you refer to the “performance”, do you mean the accuracy of the clinician's action prediction (not reported in the paper; the training procedure is described in App. C3), or the baseline reported in the plots (gray, dashed line)? In the latter case, please refer to Sec. 5.1 (baselines): we report the observed reward from the dataset, not the reward from the approximate clinician's policy; the learned policy is only used for WIS evaluation. Next, please make sure that the data has been pre-processed in a way the README file describes. Lastly, it might be useful to take a look at the paper “An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare”, as the implementation of the clinician's policy has been inspired by it.