diambra / arena

DIAMBRA Arena: a New Reinforcement Learning Platform for Research and Experimentation
https://docs.diambra.ai
Other
307 stars 22 forks source link

In the King of Fighters, observation['P1']['oppChar'] may be wrong #74

Closed amit-gshe closed 1 year ago

amit-gshe commented 1 year ago
Hi, In the process of training the King of Fighters agent, after outputting the value of observation['P1']['oppChar'] in the observation space, I saw that sometimes the value is wrong. as shown in the figure below: image env observation['P1']['oppChar'] realOppChar
0 Kyo Kyo
1 Yuri Yuri
2 Chizuru Mai
3 Terry Joe
4 Yamazaki Yamazaki
5 Yuri Ryo

I found that the wrong value is sometimes the last opponent instead of the current opponent. One possible reason is that oppChar is not updated after a new round starts.

setup Stable baselines3 with PPO in 6 envs

alexpalms commented 1 year ago

@amit-gshe thanks a lot for your ticket. First I would like to ask you a few clarifications: 1) At the beginning of the message you mention observation['P1']['ownChar'] while later at the bottom you focus on the opponent, have you found this problem on both the variables or only on the opponent one? 2) Have you been able to replicate it with a single environment execution?

While waiting for your clarification, I will try to see if I am able to replicate it with a single environment first.

EDIT: additional question, what setup are you using? Like stable - baselines wrapper or Ray RLLib, how many envs, etc

alexpalms commented 1 year ago

@amit-gshe I am also not sure to see the problem in your snapshot, in fact, I see there that the env with id=5 is the one for which the round is ended, and before the round is completed the two fighting characters were Beniamaru:Yuri while after Chang:Yuri, meaning that Beniamaru should have lost that round. But in the table at the bottom you are saying that is Yuri the wrong one, how can you say that?

From a quick test using a single environment, I do not see any problem at round change.

I will now try with multiple envs.

amit-gshe commented 1 year ago

@alexpalms Sorry I didn't make the question clear. I will edit the title and the issue to make it clear. I used stable baselines3 with 6 envs to train the agent.

alexpalms commented 1 year ago

@amit-gshe Thanks for the feedback, can you please share your settings (game settings) and wrappers_settings?

And please also comment the second question I posted, as it is not clear to me where you see a problem in the snapshot you shared

amit-gshe commented 1 year ago

@alexpalms I shared a runnable script in here, it contains the settings and wrappers_settings. I run this script with 1 env and the observation['P1']['oppChar'] is still wrong.

alexpalms commented 1 year ago

@amit-gshe thanks a lot for your feedback. I was able to replicate the problem locally in even simpler scenario, thus confirming the bug. This is related to the fact that the order of the opponents can be varied with respect to the original choice, and while we (as agent) always confirm the order of the characters we prescribe, the CPU sometimes changes it.

In fact, this error never appears in 2 players mode, because we always confirm the original characters order for both players.

We will work to fix that asap, I will let you know when done directly in this thread, you will receive the corrected env automatically, as it will be fixed in the engine docker image.

alexpalms commented 1 year ago

@amit-gshe I just pushed a new engine image for you to test, it is called diambra/engine:kof-fix. It should fix the bug you found.

It also simplifies the RAM states of the game, in fact please note that there are no more ownChar2, ownChar3, oppChar2, oppChar3, ownActiveChar and oppActiveChar RAM states

In order to use this new image, you just need to add --env.image diambra/engine:kof-fix to your diambra run command options, for example as follows: diambra run --env.image diambra/engine:kof-fix python script_to_run.py

I will wait for your confirmation that this solves your bug, before merging this fix in the official engine.

amit-gshe commented 1 year ago

@alexpalms Thanks for your effort to fix this issue, I just tried the test image and I can confirm that the problem is fixed.

alexpalms commented 1 year ago

Dear @amit-gshe, thanks a lot for your feedback and confirmation, and also for letting us know about the bug! I am happy the solution solved the problem.

I released it in the official image (diambra/engine:v2.1.0-rc17), so that now when you will execute DIAMBRA, the new engine docker image will be automatically pulled and will contain the fix, no need to manually specify it anymore.

Please note the following things:

Do not hesitate to reach out for other needs.

alexpalms commented 1 year ago

Dear @amit-gshe, I just completed the RAM states rework I mentioned above. Now the modification is complete, and it has been merged, built and deployed, so I'd suggest you to update your diambra-arena python package with pip install -U diambra-arena, you will also obtain the new engine docker image automatically at the next diambra run execution.

The new image implements the complete fix for KOF, making available the correct characters in the 3 slots (for both P1 and P2), no matter the reordering selected before the stage begins. The docs has been updated, so you find all the info here: https://docs.diambra.ai/envs/games/kof98umh/

In addition it also fixes character selection that in some edge cases was producing the wrong behavior.

This should finally close this issue, but do not hesitate to reach out in case you encounter other unexpected behavior.

Thanks!