Closed barskern closed 2 years ago
We agreed that the reviewer should merge, am I right?
Looks good! Honestly there's not much I can say. Simple but effective work. I especially like the fact that you kept also the reward as argument, that way if a learner shouldn't see the interactions we can just provide the aggreated reward and an empty list for the interactions. Good work!
Thank you! π
Actually, I saw that for the alphaless learner you give rewards as an array, should we change this in the original interface too? (as there it is defined as a float)
We agreed that the reviewer should merge, am I right?
Yes, thats true! π
Sorry, I think I'm not understanding. The alphaless uses the interactions to learn, not the reward itself, so it fits your modifications, I just have to adjust the number of arguments.
Ah, what i mean is that you pass in rewards as an array not a float, as we specified in the abstract learner base class.
- sep. 2022 kl. 19:16 skrev Raul Singh @.***>:
ο»Ώ Sorry, I think I'm not understanding. The alphaless uses the interactions to learn, not the reward itself, so it fits your modifications, I just have to adjust the number of arguments.
β Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.
@barskern maybe we are misunderstanding eachother but the alphaless learner takes reward as a float, not an array.
Aaah i am silly, I just briefly read the code and saw you called the parameter rewardS so I thought π π Letβs get this merged then!
EDIT: As I re-read it again, it's because in the simulation loop we pass rewards
to the learn function, which is written in plural form even though it is a single float. I am sorry for the commotion surrounding this! Feel free to merge when you have time π
- sep. 2022 kl. 22:11 skrev Raul Singh @.***>:
ο»Ώ @barskern maybe we are misunderstanding eachother but the alphaless learner takes reward as a float, not an array.
β Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
Updating the learner interface as we need the user interactions when learning for some activities. E.g. when learning the graph we need to know which products were activated and in which order.