Issues Training Clinician Model – Failing to Learn

Without having access to the exact data underlying the training, I can only give some high-level suggestions. First, when you refer to the “performance”, do you mean the accuracy of the clinician's action prediction (not reported in the paper; the training procedure is described in App. C3), or the baseline reported in the plots (gray, dashed line)? In the latter case, please refer to Sec. 5.1 (baselines): we report the observed reward from the dataset, not the reward from the approximate clinician's policy; the learned policy is only used for WIS evaluation. Next, please make sure that the data has been pre-processed in a way the README file describes. Lastly, it might be useful to take a look at the paper “An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare”, as the implementation of the clinician's policy has been inspired by it.

aleksa-sukovic / iclr2024-reward-design-for-justifiable-rl

Issues Training Clinician Model – Failing to Learn #2