Q-Learning in new gameplay

williamckha commented 1 month ago

Description

Working on implementing Q-Learning into the new gameplay for selecting skills. The learned policy will replace SkillGraph as the skill selection strategy.

Q-learning (1)

TODO: full write-up and add documentation

Testing Done

Resolved Issues

Length Justification and Key Files to Review

Review Checklist

It is the reviewers responsibility to also make sure every item here has been covered

[ ] Function & Class comments: All function definitions (usually in the .h file) should have a javadoc style comment at the start of them. For examples, see the functions defined in thunderbots/software/geom. Similarly, all classes should have an associated Javadoc comment explaining the purpose of the class.
[ ] Remove all commented out code
[ ] Remove extra print statements: for example, those just used for testing
[ ] Resolve all TODO's: All TODO (or similar) statements should either be completed or associated with a github issue

itsarune commented 1 month ago

actually wait, doesn't look like it compiles

williamckha commented 3 weeks ago

with the possession changes you made, does this fix the issue where intercepted balls are given back by our defense to enemy robots?

Not really, we need to change the PassDefenderTactic to hold onto the ball once it intercepts it

itsarune commented 2 weeks ago

there's also an bad optional access that is accessible somewhere when playing AI vs AI.

I recreated it by pressing FORCE_STARTduring execution of a play. Let me know if you can't recreate it and I'll try to spend some time finding a minimum reproducible example.

williamckha commented 1 week ago

hmm for some reason I still can't override the new AI plays on Thunderscope

The GenericFactory for OffensivePlay doesn't work because I need to pass in an AttackerTactic for each OffensivePlay constructor (we have one AttackerTactic instance shared by all 3 OffensivePlays so that the attacker remembers its own Q learning weights). I'm not really a fan of having 3 separate OffensivePlays anymore... I think it's sort of a flawed idea.

e.g. I often notice we enter OffensiveFriendlyThirdPlay because the ball starts in our friendly half, but we actually spend most of the play in the enemy half. We wouldn't be able to adjust our support tactics for more aggressive gameplay once we're in the enemy half because we're stuck in OffensiveFriendlyThirdPlay.

I think having one OffensivePlay is good enough for now (we're only going to have one support tactic for competition anyways, so none of these changes "matter"). In general, I don't like how our support tactics remain static throughout the length of a DynamicPlay. I think we need to look into redesigning DynamicPlay/how we select support tactics next year

This was a bit rambly but lmk your thoughts on this

there's also an bad optional access that is accessible somewhere when playing AI vs AI.

I recreated it by pressing FORCE_STARTduring execution of a play. Let me know if you can't recreate it and I'll try to spend some time finding a minimum reproducible example.

This is a problem with FreeKickPlay and should be fixed once I merge in #2953

williamckha commented 1 week ago

I think this is feature-complete and good to merge into new_gameplay_staging, wdyt @itsarune?

I've also added a new widget that displays the Q-function weights and lets us save them to an CSV file:

williamckha commented 1 day ago

Is it just me or are the replay logs missing now in /tmp/tbots/blue.

Yeah I dunno why. Maybe if I merge in #3239 it will be fixed

nimazareian commented 22 hours ago

Is it just me or are the replay logs missing now in /tmp/tbots/blue.

Yeah I dunno why. Maybe if I merge in #3239 it will be fixed

It's because ProtoLogger is commented out in thunderscope_main.py. This is probably from when I was testing the new passing architecture and I commented it out and it got pulled into new gameplay branch.

UBC-Thunderbots / Software