MC Output Feedback - Githubissues

PathmindAI / pathmind-api

2 stars 1 forks source link

MC Output Feedback #8

Closed ejunprung closed 2 years ago

ejunprung commented 2 years ago

The MC (>1 episodes) seems to output the same information as one episode which isn't so helpful for validating a policy's performance. So I had some ideas, tell me what you think.

Single Episode "Run"

The current output (both console and output csv) is perfect. Good for debugging, no need to change anything here.

Multi-Episode "Run" (i.e. Monte Carlo)

Output only the final metric (i.e. reward) value at the end of each episode. In that way, the user can use Excel, pandas, or whatever tool they prefer to analyze the results and compare to their heuristic.

maxpumperla commented 2 years ago

@ejunprung I could create a second csv on top of the first one in this case. This way you can have both. In principle I could even write an Excel workbook with multiple sheets, but I don't want to exclude users without Office.

What do you think?

maxpumperla commented 2 years ago

@ejunprung test this https://github.com/PathmindAI/pathmind-api/pull/13/files#diff-fdc58db8682bdd22599004eb68ac947e66ad6650be35db1ae8ac8517a6252cb8L19

ejunprung commented 2 years ago

@maxpumperla Nice, separate CSVs are perfect. Just one nitpick.

It looks like final rewards are summed together.

I think the reward for each episode should be independent of each other.

slinlee commented 2 years ago

@ejunprung have you tried to see if it does the same thing in other simulations?

ejunprung commented 2 years ago

Not yet. Let me try with Zinc Factory in a bit.

slinlee commented 2 years ago

I made a pr to fix the summary table #16