aLadNamedPat / SPUR

0 stars 0 forks source link

Model training issues #17

Open aLadNamedPat opened 1 week ago

aLadNamedPat commented 1 week ago

Reward is not increasing as the model trains which indicates that there is something wrong with the training process. Possible errors include:

aLadNamedPat commented 1 week ago

The algorithm presented also does not necessarily guarantee that the average number of detections per second. Instead, the final output of episodic reward is the last reward multiplied by step (n+1) / timesteps which means that there is high variance in what the average detections per second is.

This estimate is almost arbitrary so to measure model performance, reward should not be used as the actual metric.