OLA2022-Group-12 / OLA2022-Project

Project for the Online Learning Applications class at Politecnico di Milano a.a. 2021-2022
2 stars 0 forks source link

Update learner interface #18

Closed barskern closed 2 years ago

barskern commented 2 years ago

Updating the learner interface as we need the user interactions when learning for some activities. E.g. when learning the graph we need to know which products were activated and in which order.

raul-singh commented 2 years ago

We agreed that the reviewer should merge, am I right?

barskern commented 2 years ago

Looks good! Honestly there's not much I can say. Simple but effective work. I especially like the fact that you kept also the reward as argument, that way if a learner shouldn't see the interactions we can just provide the aggreated reward and an empty list for the interactions. Good work!

Thank you! πŸ˜„

Actually, I saw that for the alphaless learner you give rewards as an array, should we change this in the original interface too? (as there it is defined as a float)

barskern commented 2 years ago

We agreed that the reviewer should merge, am I right?

Yes, thats true! πŸ˜„

raul-singh commented 2 years ago

Sorry, I think I'm not understanding. The alphaless uses the interactions to learn, not the reward itself, so it fits your modifications, I just have to adjust the number of arguments.

barskern commented 2 years ago

Ah, what i mean is that you pass in rewards as an array not a float, as we specified in the abstract learner base class.

  1. sep. 2022 kl. 19:16 skrev Raul Singh @.***>:

ο»Ώ Sorry, I think I'm not understanding. The alphaless uses the interactions to learn, not the reward itself, so it fits your modifications, I just have to adjust the number of arguments.

β€” Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

raul-singh commented 2 years ago

@barskern maybe we are misunderstanding eachother but the alphaless learner takes reward as a float, not an array.

barskern commented 2 years ago

Aaah i am silly, I just briefly read the code and saw you called the parameter rewardS so I thought πŸ˜…πŸ˜… Let’s get this merged then!

EDIT: As I re-read it again, it's because in the simulation loop we pass rewards to the learn function, which is written in plural form even though it is a single float. I am sorry for the commotion surrounding this! Feel free to merge when you have time πŸ˜„

  1. sep. 2022 kl. 22:11 skrev Raul Singh @.***>:

ο»Ώ @barskern maybe we are misunderstanding eachother but the alphaless learner takes reward as a float, not an array.

β€” Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.