evaluation.py code - Githubissues

does the code in evaluation.py currently match what was used for the paper results? did the paper results use any specific settings? I trained a phase 2 policy in my own repo which matched the training performance of a a phase 2 policy in your repo but my eval metrics are:

Mean reward: 8.81$\pm$5.61 Mean episode length: 380.12$\pm$255.50 Mean number of waypoints: 0.44$\pm$0.32 Mean edge violation: 0.16$\pm$0.44

which doesnt seem to match the paper results. Also, how can i view evaluate scores split up by terrain?

chengxuxin / extreme-parkour

evaluation.py code #32