Farama-Foundation / stable-retro

Retro games for Reinforcement Learning
https://stable-retro.farama.org/
MIT License
161 stars 34 forks source link

support double dragon Neo-Geo(MAME Emulator) ? #4

Open dota2heqiuzhi opened 1 year ago

dota2heqiuzhi commented 1 year ago

I want to integrate mame2003-plus-libretro emulator in stable-retro. So I can train an AI on the game Double Dragon (there are many versions of this game, but I'm looking for the 1995 fighting game released on Neo-Geo and PS1). image

Can someone provide some tutorials or give step-by-step instructions? For example, do I need to modify the source code myself, or compile the source code? Thank you very much! I am a beginner and really don't know what to do. I really want to use this game to train an AI and get started with reinforcement learning.

Double Dragon ROM: https://archive.org/details/MAME_2003-Plus_Reference_Set_2018. The lib is available at https://github.com/libretro/mame2003-plus-libretro.

I see that Gym Retro supports the Libretro API, and MAME 2003-Plus is also developed based on Libretro. So there may not be much that needs to be changed, right?

image

image

MatPoliquin commented 1 year ago

@dota2heqiuzhi I integrated a Sega 32x emulator along with Virtua Fighter (in 3D) and made a guide on how to integrate a core in stable-retro: Emulator integration Guide

I am working on integrating the mame2003plus emulator but meeting some roadblocks, it crashed when opening a rom in the integration tool, also crashed in the basic tests. I have tried FBNeo (the supports your game as well) and same issues as with the mame2003plus one

dota2heqiuzhi commented 1 year ago

@dota2heqiuzhi I integrated a Sega 32x emulator along with Virtua Fighter (in 3D) and made a guide on how to integrate a core in stable-retro: Emulator integration Guide

I am working on integrating the mame2003plus emulator but meeting some roadblocks, it crashed when opening a rom in the integration tool, also crashed in the basic tests. I have tried FBNeo (the supports your game as well) and same issues as with the mame2003plus one

Thank you very much! It seems that this thing is quite difficult, as you couldn't handle it, and as a beginner, I should be even less able to handle it. However, when I have some free time, I will try it myself.

MatPoliquin commented 1 year ago

@dota2heqiuzhi I have maybe good news for you, I integrated FBNeo emulator (alternative to MAME) that supports your game. You can access it in the fbneo branch. I haven't merged in master yet since I still need to make the integration cleaner but it works Note that I only integrated the Mortal Kombat I game but Double Dragon should be similar

dota2heqiuzhi commented 8 months ago

@dota2heqiuzhi I have maybe good news for you, I integrated FBNeo emulator (alternative to MAME) that supports your game. You can access it in the fbneo branch. I haven't merged in master yet since I still need to make the integration cleaner but it works Note that I only integrated the Mortal Kombat I game but Double Dragon should be similar

The "gym-retro-integratioin" tool does not support zip format(FinalBurn Neo rom data type),How did you integrate the game? 企业微信截图_17054926746640

MatPoliquin commented 8 months ago

@dota2heqiuzhi It does support zip files in the fbneo branch where I integrated FBNeo: https://github.com/Farama-Foundation/stable-retro/tree/fbneo

dota2heqiuzhi commented 8 months ago

@dota2heqiuzhi It does support zip files in the fbneo branch where I integrated FBNeo: https://github.com/Farama-Foundation/stable-retro/tree/fbneo

thank you very much. I have now successfully opened the game I need (doubledr) using "Integration UI". Next, I will try to integrate this game and start learning reinforcement learning. I'll come back and ask you if I have any problems. image

victorsevero commented 8 months ago

Hello, @dota2heqiuzhi! if you're a beginner at RL, I highly recommend starting with much simpler environments so you can get a grasp on it, like some Atari games (Pong and Breakout) or maybe even take a step back from stable-retro and go directly with pure gymnasium (CartPole and Pendulum). I also highly recommend the RL bible to learn what is actually happening under the (at first seemly) complicated algorithms that you might use.

dota2heqiuzhi commented 8 months ago

Hello, @dota2heqiuzhi! if you're a beginner at RL, I highly recommend starting with much simpler environments so you can get a grasp on it, like some Atari games (Pong and Breakout) or maybe even take a step back from stable-retro and go directly with pure gymnasium (CartPole and Pendulum). I also highly recommend the RL bible to learn what is actually happening under the (at first seemly) complicated algorithms that you might use.

thanks for your advice. I have some basic knowledge of computer algorithms, and I would be more motivated to learn RL using my favorite games. I will also look for videos to learn the specific algorithms of RL. Doubledr is a fighting game and the operations it can do are limited. I think I can handle it. If it's really difficult, I'd consider starting with the simple game you mentioned. Thanks again for your advice.

dota2heqiuzhi commented 8 months ago

@dota2heqiuzhi It does support zip files in the fbneo branch where I integrated FBNeo: https://github.com/Farama-Foundation/stable-retro/tree/fbneo 企业微信截图_17057531473992

I'm having trouble again. I watched your video and successfully found the memory of some variables (such as ratings 4200). Numerical variables displayed directly on the screen are easier to find, but there is no specific value for doubledr's blood volume. Can you provide some ideas for searching? In addition, although the countdown is also a number, it cannot be found using traditional methods. Could it be that the time is actually a relatively high-precision value, and only the first few digits of precision are taken when displayed. How to find such a variable?

victorsevero commented 8 months ago

Here are some resources that may help you with your goal: Official docs: https://stable-retro.farama.org/integration/#finding-variables TASVideos: https://tasvideos.org/ReverseEngineering Hands-on video with BizHawk (although concepts still apply to any reverse engineering tool): https://youtu.be/zsPLCIAJE5o

dota2heqiuzhi commented 8 months ago

Here are some resources that may help you with your goal: Official docs: https://stable-retro.farama.org/integration/#finding-variables TASVideos: https://tasvideos.org/ReverseEngineering Hands-on video with BizHawk (although concepts still apply to any reverse engineering tool): https://youtu.be/zsPLCIAJE5o

Do I need to find all the key variables (such as the x-axis and y-axis of the character), or do I only need blood volume, victory or defeat, and countdown?

dota2heqiuzhi commented 8 months ago

@victorsevero I see that similar fighting games only define simple variables and do not define state variables in the game (such as anger bars, blue bars, character positions, etc.) image

MatPoliquin commented 8 months ago

Here are some resources that may help you with your goal: Official docs: https://stable-retro.farama.org/integration/#finding-variables TASVideos: https://tasvideos.org/ReverseEngineering Hands-on video with BizHawk (although concepts still apply to any reverse engineering tool): https://youtu.be/zsPLCIAJE5o

Do I need to find all the key variables (such as the x-axis and y-axis of the character), or do I only need blood volume, victory or defeat, and countdown?

@dota2heqiuzhi You don't need positions of fighters because the information is in the image, provided you feed your model with the game frame (as opposed to feeding it with ram values) and you don't need these information for the reward function

dota2heqiuzhi commented 7 months ago

@MatPoliquin @victorsevero Thanks very much for your help, I have now successfully trained the RL model to beat the computer AI. One question remains, can the "integration" tool play game music? Or after I finish training the model(or get the replay), can I use other players with music functions (such as MAME) to run it? It would look more enjoyable if there was music~

At present, the brute algorithm can defeat computer AI. I will learn about the PPO algorithm later. image

MatPoliquin commented 7 months ago

@dota2heqiuzhi For audio, I made it work with an old version of gym and retro but I would need to adapt it to the current version which I plan to do in the following months. Meanwhile you can check /stable-retro/retro/scripts/playback_movie.py as an example to access audio data

victorsevero commented 7 months ago

I'd recommend making a local copy of that playback_movie.py that @MatPoliquin mentioned and adding

retro.data.Integrations.add_custom_path(PATH_TO_YOUR_CUSTOM_ROM_INTEGRATION)

after the imports (actually I think that this path could be an optional parameter of this script so people wouldn't need to copy/paste it. What do you think @MatPoliquin?).

After that, just follow this, but pointing to you local script

dota2heqiuzhi commented 7 months ago

@MatPoliquin @victorsevero Hello, I encountered a problem after using ppo2+CnnPolicy to train the model with reference to stable-retro-scripts.

  1. The training process looks normal. According to my understanding, ep_rew_mean should be the average current reward of the model? I spent 20 hours training. The value of ep_rew_mean went from -80 at the beginning (the reward I set is my own HP - the enemy's HP), and slowly increased to 30. Does it mean that I can beat the computer AI? But I ran the model_vs_game script and found that the model AI still couldn't beat the computer and kept passively defending until the end of 30 seconds (my termination condition). My question is: Why is ep_rew_mean about 30, but it can't actually be beaten? In contrast to the brute algorithm, what you see is what you get. 企业微信截图_17091092192262 企业微信截图_17091093126982

  2. I understand that ppo2 should be a more advanced algorithm? Why is the reward still relatively low after 20 hours of training? The simple brute algorithm only needs about half an hour of training to achieve a reward of 40+, easily defeating computer AI. Do I need to learn more about the ppo algorithm and adjust the parameters?

This is the script I use

python3 model_trainer.py --env=Doubledr-Arcade --num_env=8 --num_timesteps=80_000_000 --play python3 model_vs_game.py --env=Doubledr-Arcade --model_1=/root/OUTPUT/Doubledr-Arcade-2024-02-27_20-14-55/Doubledr-Arcade-ppo2-CnnPolicy-80000000.zip