Open Hamza-101 opened 4 months ago
Hello Hamza,
Thanks for showing interest in my code!
The reward function is quite simple: the Bird
class is initialized with a reward signal self.reward_signal
, which is just a constant (by default it is 5). When in a given step the bird flies in the prefered direction (eastward), it is given this reward, otherwise it does not receive any reward. This is done in the function Bird.reward()
.
As for your question why I used orientation based rewards: I did that, because that has not been done in literature (at least at that time), and because I thought it was a suitable reward system to stimulate flock behaviour, without explicitly rewarding it, or implementing it in the system (as the Vicsek model does).
You can read more about the model in my thesis, which is publicly available herre: https://studenttheses.universiteitleiden.nl/access/item%3A2711425/view.
Best,
André
I am trying to replicate your reward function for an open ended environment where I vary acceleration. 1. As I understand it you are just using alignment right? 2. Do you terminate an episode and if so with what conditions? 3. How do you ensure sufficient separation?
Thanks for the help.
Hello Andre,
I was wondering if you had a mathematical equation available for the reward function or could point it out to me in the code file. Much appreciated.
Also why did you only use only orientation based rewards (as mentioned in the repo?
Regards, Hamza