andredelft / flock-learning

A model exploring collective motion using reinforcement learning with orientation-based rewards (MSc thesis).
MIT License
2 stars 0 forks source link

Guidance #3

Open Hamza-101 opened 4 months ago

Hamza-101 commented 4 months ago

Hello Andre,

I was wondering if you had a mathematical equation available for the reward function or could point it out to me in the code file. Much appreciated.

Also why did you only use only orientation based rewards (as mentioned in the repo?

Regards, Hamza

andredelft commented 4 months ago

Hello Hamza,

Thanks for showing interest in my code!

The reward function is quite simple: the Bird class is initialized with a reward signal self.reward_signal, which is just a constant (by default it is 5). When in a given step the bird flies in the prefered direction (eastward), it is given this reward, otherwise it does not receive any reward. This is done in the function Bird.reward().

As for your question why I used orientation based rewards: I did that, because that has not been done in literature (at least at that time), and because I thought it was a suitable reward system to stimulate flock behaviour, without explicitly rewarding it, or implementing it in the system (as the Vicsek model does).

You can read more about the model in my thesis, which is publicly available herre: https://studenttheses.universiteitleiden.nl/access/item%3A2711425/view.

Best,

André

Hamza-101 commented 4 months ago

I am trying to replicate your reward function for an open ended environment where I vary acceleration. 1. As I understand it you are just using alignment right? 2. Do you terminate an episode and if so with what conditions? 3. How do you ensure sufficient separation?

Thanks for the help.