Collect environment and action metrics during reinforcement learning

ricklindstrom commented 4 years ago

Creates a metrics csv file that allows train or enjoy progress to be easily explored, graphed and debugged.

For example here is a scatter plot of x and y that demonstrates the default training settings fail to train the bot and instead induce either wiggling in place or spinning in place: The above generated with

data = pd.read_csv("metrics-training.csv") 
sns.scatterplot(x="x", y="y", hue='step', data=data)

A sample excerpt from the metrics csv file for training:

datetime,step,x,y,angle,speed,steering,center_dist,center_angle,reward,total_reward
2019-11-10 13:24:51.595017,1,2.643872463995367,2.9012396542240886,4.8690985603846855,0.3757399320602417,0.46001503,-0.13055107592213924,0.035870341709401815,3.19559684058306,3.19559684058306
2019-11-10 13:24:51.624950,2,2.6461244240452295,2.9163726845417557,4.8511330472675125,0.405402946472168,0.3595909,-0.12986506835069456,0.0538358548265737,3.1596744597449034,6.3552713003279635
2019-11-10 13:24:51.643253,3,2.646838734203294,2.9244335382732203,4.750412590849644,0.33081514835357667,0.073977984,-0.12902339261159254,0.15455631124444458,2.9496454310411275,9.30491673136909

Wrappers modifying observations or rewards should be below the MetricsWrapper. Wrappers modifying actions should be above the MetricsWrapper. Therefore the ActionWrapper was moved above the MetricsWrapper to allow the MetricsWrapper to see the modified Actions.

I'm new to python and duckietown and new to pull requests to open source projects. So your guidance and feedback are much appreciated.

liampaull commented 4 years ago

@bhairavmehta95 this looks cool can you take a look?

ricklindstrom commented 4 years ago

Some other examples. Here is a plot of reward as a function of 'center_angle' that shows the reward is well tuned for 'center_angle': sns.scatterplot(x="center_angle", y="reward", hue='step', data=data) Here is a plot of reward as a function of 'center_dist' that shows my reward was NOT well tuned for 'center dist': sns.scatterplot(x="center_dist", y="reward", hue='step', data=data)

Here is an example of more successful driving

ricklindstrom commented 4 years ago

Oh. I notice that because my pull request was from master and not a branch, other things I am committing to my master are polluting the PR. Sorry. Let me know if or how you need me to fix this.

bhairavmehta95 commented 4 years ago

Sorry, just saw after LP tagged me.

@ricklindstrom thank you for this contribution, this looks fantastic. It honestly seems basically ready to go, except for:

Can you move the notebooks from the main directory to a new directory inside of learning/ called notebooks/?

duckietown / gym-duckietown

Collect environment and action metrics during reinforcement learning #182