Closed shmily326 closed 1 year ago
@shmily326 Thank you for opening an issue! I will look into it and get back to you.
Thanks @shmily326! You seem to be correct in your comment, we had several issues regarding emitters and receivers since the beginning, and there were some bugs in webots too (see https://github.com/cyberbotics/webots/issues/1384, where multiple issues were fixed).
As @KelvinYang0320 mentioned, we will look into it and incorporate required changes to make it work as close as possible to what is expected.
Meanwhile, i would suggest using the RobotSupervisor scheme which uses the same controller both to control the robot and act as supervisor. Its usage is much more efficient and straightforward in cases where you don't specifically require separation between robot and supervisor. If you want you can share any additional information about your use-case, so we can discuss it further.
Hi @shmily326 With the following modifications, 1.
def step(self, action):
"""
The basic step method that steps the controller,
calls the method that sends the action through the emitter
and returns the (observations, reward, done, info) object.
:param action: Whatever the use-case uses as an action, e.g.
an integer representing discrete actions
:type action: Defined by the implementation of handle_emitter
:return: (observations, reward, done, info) as provided by the
corresponding methods as implemented for the use-case
"""
print(self.getFromDef("ROBOT").getPosition()[0], "step-1")
if super(Supervisor, self).step(self.timestep) == -1:
exit()
print(self.getFromDef("ROBOT").getPosition()[0], "step-2")
self.handle_emitter(action)
print(self.getFromDef("ROBOT").getPosition()[0], "step-3")
return (
self.get_observations(),
self.get_reward(action),
self.is_done(),
self.get_info(),
)
2.
def handle_emitter(self):
"""
This emitter uses the user-implemented create_message() method to get
whatever data the robot gathered, convert it to a string if needed and
then use the emitter to send the data in a string utf-8 encoding to the
supervisor.
"""
print("handle_emitter")
data = self.create_message()
...
def handle_receiver(self):
"""
This receiver uses the basic Webots receiver-handling code. The
use_message_data() method should be implemented to actually use the
data received from the supervisor.
"""
print("handle_receiver")
if self.receiver.getQueueLength() > 0:
...
you will get the following in cartPoleWorldEmitterReceiver on Webots 2023a: 0:00:00:000~0:00:00:032 RESET 0.0 step-1 handle_receiver handle_emitter
0:00:00:032~0:00:00:064 -1.546550598149922e-22 step-2 -1.546550598149922e-22 step-3 -1.546550598149922e-22 step-1 handle_receiver handle_emitter
0:00:00:064~0:00:00:096 1.1115304692030285e-08 step-2 1.1115304692030285e-08 step-3 1.1115304692030285e-08 step-1 handle_receiver handle_emitter
From my perspective, you will not get the next state in $t+3$. However, we do need to address this issue.
@shmily326 I have opened a PR to address that. Could you check if the problem is solved?
git clone https://github.com/aidudezzz/deepbots.git
git checkout step_function
pip install -e .
@KelvinYang0320 Thank you for all of your time, I'm working on multi-agent RL (specifically a multi-UAV navigation scenario and Actor-Critic algorithms), thus I think the emitter-receiver scheme would be more appropriate, and I will check the " step the controller after applying the action" method and get back to you as soon as possible.
@shmily326 You can take a look at this PR for a multi-robot example. Also, we have several examples in deepworlds.
I'm working on multi-agent RL
That sounds great! For multi-agent scenarios indeed it can be better to have a centralized supervisor that communicates with multiple robots, so you need to use the emitter-receiver scheme. When completed, if you want, we will be happy to include your scenario as an example on our deepworlds repository! :smile:
@shmily326 You can get updated deepbots by
git clone https://github.com/aidudezzz/deepbots.git
cd ./deepbots
pip install -e .
We have merged the PR.
@shmily326 Just a reminder, you can pip install git+https://github.com/aidudezzz/deepbots.git
for general use before we publish the next version of deepbots on PyPI.
We would like to close this issue. Feel free to open another issue or reopen it if needed. Also, we will be glad if you share your work or experience with us. 😄
Hi there, I'm deeply confused by the concrete communication process (timing) in the emitter-receiver scheme implemented in deepbots, since in Webots it takes one basic timestep to transmit and deliver the message from emitters to receivers, which means the action $a{t}$ adopted by supervisor according to state $s{t}$ will be delivered to robot in timeslot $t+1$, and the new state(observation) caused by $a_{t}$ will be updated and emitted to supervisor in timeslot $t+2$, which is finally presented in supervisor as $s^{\prime}$ in timeslot $t+3$.
On the basis of the above insight, I find that the transitions saved for RL training in deepbots tutorials is somewhat like $(s{t}$, $a{t}$, $r{t}$, $s{t+1})$, but in fact, the action which acted on state $s{t}$ (or the action which robot executed indeed) is somewhat like $a{t-3}$, there is a difference between $a{t-3}$ and $a{t}$ even though timestep is in the scale of millisecond.
To be honest, my question may not be too clear, I'm appreciated if someone could correct me or explain my doubt, thanks a lot!
My doubt is somewhat relative with this issue