doubt about the implement of the emitter-receiver scheme

shmily326 commented 1 year ago

Hi there, I'm deeply confused by the concrete communication process (timing) in the emitter-receiver scheme implemented in deepbots, since in Webots it takes one basic timestep to transmit and deliver the message from emitters to receivers, which means the action $a{t}$ adopted by supervisor according to state $s{t}$ will be delivered to robot in timeslot $t+1$, and the new state(observation) caused by $a_{t}$ will be updated and emitted to supervisor in timeslot $t+2$, which is finally presented in supervisor as $s^{\prime}$ in timeslot $t+3$.

On the basis of the above insight, I find that the transitions saved for RL training in deepbots tutorials is somewhat like $(s{t}$, $a{t}$, $r{t}$, $s{t+1})$, but in fact, the action which acted on state $s{t}$ (or the action which robot executed indeed) is somewhat like $a{t-3}$, there is a difference between $a{t-3}$ and $a{t}$ even though timestep is in the scale of millisecond.

To be honest, my question may not be too clear, I'm appreciated if someone could correct me or explain my doubt, thanks a lot!

My doubt is somewhat relative with this issue

KelvinYang0320 commented 1 year ago

@shmily326 Thank you for opening an issue! I will look into it and get back to you.

tsampazk commented 1 year ago

Thanks @shmily326! You seem to be correct in your comment, we had several issues regarding emitters and receivers since the beginning, and there were some bugs in webots too (see https://github.com/cyberbotics/webots/issues/1384, where multiple issues were fixed).

As @KelvinYang0320 mentioned, we will look into it and incorporate required changes to make it work as close as possible to what is expected.

Meanwhile, i would suggest using the RobotSupervisor scheme which uses the same controller both to control the robot and act as supervisor. Its usage is much more efficient and straightforward in cases where you don't specifically require separation between robot and supervisor. If you want you can share any additional information about your use-case, so we can discuss it further.

KelvinYang0320 commented 1 year ago

Hi @shmily326 With the following modifications, 1.

  def step(self, action):
          """
          The basic step method that steps the controller,
          calls the method that sends the action through the emitter
          and returns the (observations, reward, done, info) object.

          :param action: Whatever the use-case uses as an action, e.g.
              an integer representing discrete actions
          :type action: Defined by the implementation of handle_emitter
          :return: (observations, reward, done, info) as provided by the
              corresponding methods as implemented for the use-case
          """
          print(self.getFromDef("ROBOT").getPosition()[0], "step-1")
          if super(Supervisor, self).step(self.timestep) == -1:
              exit()
          print(self.getFromDef("ROBOT").getPosition()[0], "step-2")
          self.handle_emitter(action)
          print(self.getFromDef("ROBOT").getPosition()[0], "step-3")
          return (
              self.get_observations(),
              self.get_reward(action),
              self.is_done(),
              self.get_info(),
          )

2.

  def handle_emitter(self):
          """
          This emitter uses the user-implemented create_message() method to get
          whatever data the robot gathered, convert it to a string if needed and
          then use the emitter to send the data in a string utf-8 encoding to the
          supervisor.
          """
          print("handle_emitter")
          data = self.create_message()
          ...

  def handle_receiver(self):
          """
          This receiver uses the basic Webots receiver-handling code. The
          use_message_data() method should be implemented to actually use the
          data received from the supervisor.
          """
          print("handle_receiver")
          if self.receiver.getQueueLength() > 0:
          ...

you will get the following in cartPoleWorldEmitterReceiver on Webots 2023a: 0:00:00:000~0:00:00:032 RESET 0.0 step-1 handle_receiver handle_emitter

0:00:00:032~0:00:00:064 -1.546550598149922e-22 step-2 -1.546550598149922e-22 step-3 -1.546550598149922e-22 step-1 handle_receiver handle_emitter

0:00:00:064~0:00:00:096 1.1115304692030285e-08 step-2 1.1115304692030285e-08 step-3 1.1115304692030285e-08 step-1 handle_receiver handle_emitter

From my perspective, you will not get the next state in $t+3$. However, we do need to address this issue.

KelvinYang0320 commented 1 year ago

@shmily326 I have opened a PR to address that. Could you check if the problem is solved?

git clone https://github.com/aidudezzz/deepbots.git
git checkout step_function
pip install -e .

shmily326 commented 1 year ago

@KelvinYang0320 Thank you for all of your time, I'm working on multi-agent RL (specifically a multi-UAV navigation scenario and Actor-Critic algorithms), thus I think the emitter-receiver scheme would be more appropriate, and I will check the " step the controller after applying the action" method and get back to you as soon as possible.

KelvinYang0320 commented 1 year ago

@shmily326 You can take a look at this PR for a multi-robot example. Also, we have several examples in deepworlds.

tsampazk commented 1 year ago

I'm working on multi-agent RL

That sounds great! For multi-agent scenarios indeed it can be better to have a centralized supervisor that communicates with multiple robots, so you need to use the emitter-receiver scheme. When completed, if you want, we will be happy to include your scenario as an example on our deepworlds repository! :smile:

KelvinYang0320 commented 1 year ago

@shmily326 You can get updated deepbots by

git clone https://github.com/aidudezzz/deepbots.git
cd ./deepbots
pip install -e .

We have merged the PR.

KelvinYang0320 commented 1 year ago

@shmily326 Just a reminder, you can pip install git+https://github.com/aidudezzz/deepbots.git for general use before we publish the next version of deepbots on PyPI. We would like to close this issue. Feel free to open another issue or reopen it if needed. Also, we will be glad if you share your work or experience with us. 😄

aidudezzz / deepbots

doubt about the implement of the emitter-receiver scheme #119