ZoopOTheGoop / ffxiv-crafting-solver

A lot of abstract math done on reinforcement learning just to solve crafting in a video game
Other
0 stars 0 forks source link

Implement basic RL for simulator #5

Open ZoopOTheGoop opened 2 years ago

ZoopOTheGoop commented 2 years ago

Self explanatory, attempt to do basic value iteration on the simulator. Failing that we will re-evaluate if heuristic space pruning or sample-based Q-learning is more prudent.

This may be split into multiple issues down the line, it may be worth trying both structured RL as well as more simple real-number based RL. For HQ/NQ this should be very similar except for difficult recipes that can be failed (large negative penalty). For collectibles it may be more interesting to evaluate the pros/cons in how you reward the tiers--is simply maximizing quality good enough or should we reward each tier separately since just 1 quality can make a huge reward difference.

Minimum criteria for success:

Non-issues for the moment: