We were trying to plot the rewards after each episode
This are some the ideas we run ,
First , Created a another file in the directory ICLR23Workshop, visualize.py
In this file we run this codes to visualize the dnq function ,
[ ] The initial understanding was the number episodes are 10000 as set in the dqn function ,
We tried this first
This code could iterated 10000 times
( We did not wait for it to finish )
The code was
import matplotlib.pyplot as plt
# importing the function we are running from collectdata
from collectdata import test_dqn
# creating a list to store the reward per episode
#Total episodes are 10000
rewards_per_episode = []
total_rewards = 0
# the loop that will run and save the rewards in the list
for episode in range(1, 10001):
#run the test_dqn function and get the reward
reward = test_dqn("https://gist.githubusercontent.com/slremy/5adf90df6f4c096258f8d8af0f3039dc/raw/d85ac5f204771657b015055eb3b6b9f5dcf33956/location1.csv")
#add the reward to the total_rewards
total_rewards += reward
# if the episode number is a multiple of 500, append the average reward per episode to the rewards_per_episode list and reset the total_rewards
if episode % 500 == 0:
rewards_per_episode.append(total_rewards/500)
total_rewards = 0
# declaring the range for plotting
episodes = range(500, 10001, 500)
#plotting the rewards_per_episode vs. episodes
plt.plot(episodes, rewards_per_episode)
plt.xlabel('Episodes')
plt.ylabel('Reward')
plt.title('Reward per Episode')
plt.show()
[x] The second idea was to take the 10000 episodes as a single action making up a single episode , create a range and had a for loop running n times to get total rewards
this was the code ,
# Importing the function test_dqn from collectdata module which we will be running
from collectdata import test_dqn
rewards = [] # Initializing an empty list to store rewards obtained in each episode
# Looping over 5 episodes
for episode in range(1, 6):
# Calling the test_dqn function with the given URL as argument and storing the returned reward value
reward = test_dqn("https://gist.githubusercontent.com/slremy/5adf90df6f4c096258f8d8af0f3039dc/raw/d85ac5f204771657b015055eb3b6b9f5dcf33956/location1.csv")
# Appending the reward value to the rewards list
rewards.append(reward)
episodes = [1, 2, 3, 4, 5] # Initializing a list of episode numbers
# Plotting a line graph with episode numbers on x-axis and rewards on y-axis
plt.plot(episodes, rewards)
plt.xlabel('Episodes') # Labeling the x-axis as 'Episodes'
plt.ylabel('Reward') # Labeling the y-axis as 'Reward'
plt.title('Reward per Episode') # Setting the title of the plot
plt.show() # Displaying the plot on the screen
It was plotting out the graph but the logic felt wrong since we are taking the 10000 episodes as actions making up one episode ,(from our understanding)
[ ] Another idea we tried was to directly edit collectdata.py file and add this code to the while loop in the dqn function
rewards_per_episode = []
while not done:
a = model.choose_action(s)
s, r, done, truncated, info = env.step(a)
episode_reward+=r
rewards_per_episode.append(episode_reward)
and this was the visualize.py file
# importing matplotlip
import matplotlib.pyplot as plt
# importing collectdata
import collectdata
num_episodes = 10000
collectdata.test_dqn("https://gist.githubusercontent.com/slremy/5adf90df6f4c096258f8d8af0f3039dc/raw/d85ac5f204771657b015055eb3b6b9f5dcf33956/location1.csv")
# Plot rewards per episode
plt.plot(range(num_episodes), collectdata.rewards_per_episode)
plt.xlabel('Episodes')
plt.ylabel('Reward')
plt.title('Reward per Episode')
plt.show()
This gave us a very interesting output , a list of 10 rewards only ,
And cloud not plot the visualization because of the value error here :
ValueError: x and y must have same first dimension, but have shapes (10000,) and (10,)
@ what can you advice on the actions and episodes the is running
We were trying to plot the rewards after each episode
This are some the ideas we run ,
First , Created a another file in the directory ICLR23Workshop, visualize.py
In this file we run this codes to visualize the dnq function ,
This code could iterated 10000 times ( We did not wait for it to finish )
The code was
this was the code ,
It was plotting out the graph but the logic felt wrong since we are taking the 10000 episodes as actions making up one episode ,(from our understanding)
and this was the visualize.py file
This gave us a very interesting output , a list of 10 rewards only ,
And cloud not plot the visualization because of the value error here :
ValueError: x and y must have same first dimension, but have shapes (10000,) and (10,)
@ what can you advice on the actions and episodes the is running