Open aLadNamedPat opened 4 months ago
The algorithm presented also does not necessarily guarantee that the average number of detections per second. Instead, the final output of episodic reward is the last reward multiplied by step (n+1) / timesteps which means that there is high variance in what the average detections per second is.
This estimate is almost arbitrary so to measure model performance, reward should not be used as the actual metric.
Reward is not increasing as the model trains which indicates that there is something wrong with the training process. Possible errors include: