Rewards decrease in late training

vigorPan commented 1 year ago

How to solve the problem of reward decrease in reinforcement learning DDPG algorithm in the later stage of training

yangliang011 commented 1 year ago

How do you solve the problem of phase sum squared as one, which has been bothering me for a long time

baturaysaglam commented 1 year ago

How to solve the problem of reward decrease in reinforcement learning DDPG algorithm in the later stage of training

you can cut off the training earlier.

baturaysaglam commented 1 year ago

How do you solve the problem of phase sum squared as one, which has been bothering me for a long time

what do you mean by that? can you elaborate more on that?

yangliang011 commented 1 year ago

the sum of that square of the real and imaginary parts is one accord to Euler's formula

yangliang011 commented 1 year ago

How do you solve the problem of phase sum squared as one, which has been bothering me for a long time

what do you mean by that? can you elaborate more on that?

Your algorithm cannot guarantee that the sum of the squares of the real and imaginary parts of a reflector element is one, so the premise that your modulus is one is not satisfied

baturaysaglam commented 1 year ago

How do you solve the problem of phase sum squared as one, which has been bothering me for a long time

what do you mean by that? can you elaborate more on that?

Your algorithm cannot guarantee that the sum of the squares of the real and imaginary parts of a reflector element is one, so the premise that your modulus is one is not satisfied

are you sure? simply print the phase part of the normalized action before returning it in DDPG.py, i.e., right before line 74:

print(self.compute_phase((a / division_term).detach()))

I get unit modulus all the time. For example:

Time step: 105 Episode Num: 1 Reward: 1.291
(tensor([[1.]], device='cuda:0'), tensor([[1.]], device='cuda:0'))
(tensor([[1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000]], device='cuda:0'), tensor([[1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000]], device='cuda:0'))
(tensor([[1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000]], device='cuda:0'), tensor([[1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000],
        [1.0000]], device='cuda:0'))
Time step: 106 Episode Num: 1 Reward: 1.281
(tensor([[1.]], device='cuda:0'), tensor([[1.0000]], device='cuda:0'))

yangliang011 commented 1 year ago

Phi [[-0.4658936-0.61511719j 0. -0.j ] [ 0. -0.j -0.2412132-0.09198956j]]，I tested the output, the phi matrix is this, how do you make sure that 0.24 squared plus 0.09 squared is one? 0.46 Square plus 0.61 square is not one too

baturaysaglam commented 1 year ago

can you provide the code for calculating the results you got?

yangliang011 commented 1 year ago

can you provide the code for calculating the results you got?

def step(self, action):
    self.episode_t += 1

    action = action.reshape(1, -1)

    G_real = action[:, :self.M ** 2]
    G_imag = action[:, self.M ** 2:2 * self.M ** 2]

    Phi_real = action[:, -2 * self.L:-self.L]

    Phi_imag = action[:, -self.L:]

    self.G = G_real.reshape(self.M, self.K) + 1j * G_imag.reshape(self.M, self.K)

    self.Phi = np.eye(self.L, dtype=complex) * (Phi_real + 1j * Phi_imag)
    print("Phi", self.Phi)，114 lines in the environment file .

baturaysaglam commented 1 year ago

you should compute the norm of the phase shifts as follows:

    def step(self, action):
        self.episode_t += 1

        action = action.reshape(1, -1)

        G_real = action[:, :self.M ** 2]
        G_imag = action[:, self.M ** 2:2 * self.M ** 2]

        Phi_real = action[:, -2 * self.L:-self.L]
        Phi_imag = action[:, -self.L:]

        modulus = np.sum(np.abs(Phi_real)).reshape(-1, 1) * np.sqrt(2), np.sum(np.abs(Phi_imag)).reshape(-1, 1) * np.sqrt(2)

        print(modulus)

I described here why.

yangliang011 commented 1 year ago

you should compute the norm of the phase shifts as follows:

    def step(self, action):
        self.episode_t += 1

        action = action.reshape(1, -1)

        G_real = action[:, :self.M ** 2]
        G_imag = action[:, self.M ** 2:2 * self.M ** 2]

        Phi_real = action[:, -2 * self.L:-self.L]
        Phi_imag = action[:, -self.L:]

        modulus = np.sum(np.abs(Phi_real)).reshape(-1, 1) * np.sqrt(2), np.sum(np.abs(Phi_imag)).reshape(-1, 1) * np.sqrt(2)

        print(modulus)

I described here why.

I'm sorry, I don't understand why you have to add everything up. Each original phase should satisfy Euler's equation, right?

baturaysaglam / RIS-MISO-Deep-Reinforcement-Learning

Rewards decrease in late training #8