DeNA / HandyRL

HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
MIT License
282 stars 43 forks source link

[REQUESTING OVERVIEW OF DISTRIBUTED HANDYRL] #211

Open adypd97 opened 3 years ago

adypd97 commented 3 years ago

Hello HandyRL Team!

First off, thanks for making such a useful repository for RL! I love it!

I am trying to understand how the distributed architecture of HandyRL works, but due to lack of documentation so far its been difficult to understand how it's implemented.

I'll give an example (following the Large Scale Training document in the repo):
I have 3 VMs running on GCP (1 as the server (the learner) and 2 other as workers). In the config.yaml file I entered the external IP (the document says its valid to enter the external IP too) of the learner in the worker args parameter for both workers (as per instructions in the document) and tried to run it. However, I don't see anything happen. In the following output the server appears to continue to sleep and does nothing.

OUTPUT:

xyz@vm1:~/HandyRL$ python3 main.py --train-server {'env_args': {'env': 'HungryGeese'}, 'train_args': {'turn_based_training': False, 'observation': False, 'gamma': 0.8, 'forward_steps': 32, 'compress_steps': 4, 'entropy_regularization': 0.002, 'entropy_regularization_decay': 0.3, 'update_episodes': 500, 'batch_size': 400, 'minimum_episodes': 1000, 'maximum_episodes': 200000, 'epochs': -1, 'num_batchers': 7, 'eval_rate': 0.1, 'worker': {'num_parallel': 32}, 'lambda': 0.7, 'max_self_play_epoch': 1000, 'policy_target': 'TD', 'value_target': 'TD', 'eval': {'opponent': ['modelbase'], 'weights_path': 'None'}, 'seed': 0, 'restart_epoch': 0}, 'worker_args': {'server_address': '', 'num_parallel': 32}}
Loading environment football failed: No module named 'gfootball'
started batcher 0 started batcher 1
started batcher 2
started batcher 3
started batcher 4
started batcher 5
waiting training
started entry server 9999
started batcher 6
started worker server 9998
started server

I was hoping you could provide some guidance as to how I can proceed. In any case, a documentation or brief but complete background on the distributed architecture would also be appreciated to debug the problem on my own.

Thank you!

ikki407 commented 3 years ago

Hi @adypd97, thank you for your interest in HandyRL!

First of all, after the training server launched, you need to run the workers in the VMs for worker: python main.py --worker (you should write the server address in the worker config (i.e. worker_args)) This command connects the workers to the server. After the server detects the worker connection, the learning process starts.

We illustrated the overview of the distributed architecture before in the Google Football Research competition. I hope this helps you.

Thanks

adypd97 commented 3 years ago

Hi @ikki407!

Thanks for the link to the documentation! Very helpful!

To the main issue: Yes, I ran 2 worker VMs following the steps you mention (also, I entered the public IP of server VM (learner) for both workers in the worker_args parameter). Following that I got the OUTPUT mentioned in my initial comment. It seems like the learner is not able to detect the workers.

As further evidence for that I added a simple print statement to the following file ./handyrl/train.py in the following function (starting line 404):

    def run(self):
        print('waiting training')
        while not self.shutdown_flag:
            if len(self.episodes) < self.args['minimum_episodes']:
 >>>            print('here')
                time.sleep(1)
                continue
            if self.steps == 0:
                self.batcher.run()
                print('started training')
            model = self.train()
            self.report_update(model, self.steps)
        print('finished training') 

And in the output I get the following: OUTPUT:

xyz@vm1:~/HandyRL$ python3 main.py --train-server {'env_args': {'env': 'HungryGeese'}, 'train_args': {'turn_based_training': False, 'observation': False, 'gamma': 0.8, 'forward_steps': 32, 'compress_steps': 4, 'entropy_regularization': 0.002, 'entropy_regularization_decay': 0.3, 'update_episodes': 500, 'batch_size': 400, 'minimum_episodes': 1000, 'maximum_episodes': 200000, 'epochs': -1, 'num_batchers': 7, 'eval_rate': 0.1, 'worker': {'num_parallel': 32}, 'lambda': 0.7, 'max_self_play_epoch': 1000, 'policy_target': 'TD', 'value_target': 'TD', 'eval': {'opponent': ['modelbase'], 'weights_path': 'None'}, 'seed': 0, 'restart_epoch': 0}, 'worker_args': {'server_address': '', 'num_parallel': 32}} Loading environment football failed: No module named 'gfootball' started batcher 0 started batcher 1 started batcher 2 started batcher 3 started batcher 4 started batcher 5 waiting training started entry server 9999 started batcher 6 started worker server 9998 started server here here here...

I hope you find this helpful in assisting me. In any case thanks once again!

ikki407 commented 3 years ago

From your outputs, it seems that the server is not connecting to the workers.

Next steps to debug...

ikki407 commented 3 years ago

What the worker process/VM looks like? If the workers are still running without any errors, there maybe exist some problems I didn’t watch before.