datamllab / rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
http://www.rlcard.org
MIT License
2.78k stars 615 forks source link

Save an NFSP agent #318

Closed NhaundarL closed 1 month ago

NhaundarL commented 1 month ago

Hello,

I looked through this project and came across this code: examples/run_rl.py, trying it without loading an AI it seemed to work, but once I tried loading an AI already train, there were problems. Let me explain that the time is Y for The evaluation episode is X. Which makes little sense, but it may be due to an error on my part. Here is my command to create the AI ​​(without loading): python examples/run_rl.py --log_dir experiments/test_example_rl/

And here is this one to load the old AI using its checkpoint (this command only takes a few minutes): python examples/run_rl.py --num_episodes 100000 --log_dir experiments/test_example_rl_suite/ --load_checkpoint_path experiments/test_example_rl/checkpoint_dqn.pt --save_every 100

I would also like to point out that once the order is made, the experimental/test_example_rl_suite/ folder does not contain a checkpoint so it is impossible to reuse it because we need it to load the AI. I would like to know if it's normal that once the AI ​​is trained, then all new training is much faster, or if it's me who made a bad command or if I misinterpreted a code or other.

I was hesitant to use cfr to simply avoid having to save and load my agent with torch.

Thanks for the project!