SforAiDl / genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL
https://genrl.readthedocs.io
MIT License
404 stars 59 forks source link

Distributed Framework #327

Open threewisemonkeys-as opened 3 years ago

threewisemonkeys-as commented 3 years ago

This is a very rough draft of a trainer for distributed off policy agents.

Currently working on getting DDPG to be trained in distributed manner using this.

codecov[bot] commented 3 years ago

Codecov Report

Merging #327 into master will decrease coverage by 4.04%. The diff coverage is 7.07%.

@@            Coverage Diff             @@
##           master     #327      +/-   ##
==========================================
- Coverage   90.87%   86.83%   -4.05%     
==========================================
  Files          88       97       +9     
  Lines        3705     4017     +312     
==========================================
+ Hits         3367     3488     +121     
- Misses        338      529     +191     
Impacted Files Coverage Δ
genrl/distributed/__init__.py 0.00% <0.00%> (ø)
genrl/distributed/actor.py 0.00% <0.00%> (ø)
genrl/distributed/core.py 0.00% <0.00%> (ø)
genrl/distributed/experience_server.py 0.00% <0.00%> (ø)
genrl/distributed/learner.py 0.00% <0.00%> (ø)
genrl/distributed/parameter_server.py 0.00% <0.00%> (ø)
genrl/trainers/distributed.py 29.62% <29.62%> (ø)
genrl/core/buffers.py 93.84% <33.33%> (-1.40%) :arrow_down:
genrl/agents/deep/base/offpolicy.py 96.20% <66.66%> (-1.21%) :arrow_down:
genrl/agents/deep/ddpg/ddpg.py 86.20% <100.00%> (-6.11%) :arrow_down:
... and 33 more
lgtm-com[bot] commented 3 years ago

This pull request introduces 3 alerts when merging 4cba7272e75de40c5ec208e8b6ac0f4f52d44647 into 0fe41807d67ea504e0b57dc429402a3a99d2c253 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 3 years ago

This pull request introduces 10 alerts when merging 3d233c47d625f7749c46fc0cea6dcdf35e42b0a6 into 0fe41807d67ea504e0b57dc429402a3a99d2c253 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 3 years ago

This pull request introduces 15 alerts when merging 1c504cc3142bd210b070abda782cccc644856e3c into bb85ea19304646ea0210331cac08da94e1e8544c - view on LGTM.com

new alerts:

lgtm-com[bot] commented 3 years ago

This pull request introduces 19 alerts when merging 2c3298a725727f991b5ae26db6fe7ca85734e321 into 608fc034de85d1b1df9e38079ff3b9e2915bb0f6 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 3 years ago

This pull request introduces 14 alerts when merging 73586d53cc5816b5d8fa127663fbb56a47ad1a4d into 52b0b4c0399d612c68b705c1e718dfd8e5d22ce3 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 3 years ago

This pull request introduces 2 alerts when merging 072d545dc77df543b4ae2cd737c8fde73a23b0c0 into 52b0b4c0399d612c68b705c1e718dfd8e5d22ce3 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 3 years ago

This pull request introduces 2 alerts when merging e2eef667cf36717fb2c209a70059be7f26ac23ba into 25eb018f18a9a1d0865c16e5233a2a7ccddbfd78 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 3 years ago

This pull request introduces 2 alerts when merging 837eb18878083e8a53c360cbc8023b74efd95f9f into 25eb018f18a9a1d0865c16e5233a2a7ccddbfd78 - view on LGTM.com

new alerts:

threewisemonkeys-as commented 3 years ago

How is the Actor definition going to work? Can I define any architecture for the actor?(this would be ideal behavior)

Yeah. The ActorNode class only has multiprocessing details, it can basically accommodate any agent. You can see an example in the onpolicy example. The user has the ability to define which agent to run through the agent argument to the ActorNode and how to run it using the collect_experience function also passed as argument.

threewisemonkeys-as commented 3 years ago

(I dont see any neural network definitions as of now).

This is happening internally in the DDPG agent I'm passing to the ActorNode. For On policy the current genrl framework was too inflexible. So I created my own in the onpolicy example. I think we should move towards this kind of more modular structure in general instead of the current one which is more hierarchical.

lgtm-com[bot] commented 3 years ago

This pull request introduces 5 alerts when merging 8030b2a77ebb607532e38eddf5d97367a79cfda2 into 25eb018f18a9a1d0865c16e5233a2a7ccddbfd78 - view on LGTM.com

new alerts:

threewisemonkeys-as commented 3 years ago

Another thought that I had was: do you think we could somehow use decorators here? There's a bunch of core details we can get rid of then.

I haven't used decorators too extensively before, I'll look into it though. Did you have any specific ideas in mind?