Meng-Ling-Ori / Value_free

0 stars 0 forks source link

ToDo: reinforcement schedule in class `environment_free_operant(object):` #3

Open SaschaFroelich opened 3 years ago

SaschaFroelich commented 3 years ago

1) In class environment_free_operant(object): in method def obtained_reinforcement(self,t,action,VR,VI,omission):, the reinforcers for actions are paid out like this:

number_of_reinforcer = sum([np.random.choice(range(self.nm-1), p = [1-VR/100, VR/100]) for i in range(np.where(action ==1)[0][0])])

If I interpret the code correctly, then action (which is an array of size 4) indicates how often the lever was pressed in a second: 0, 1, 2 or 3 times. But if the lever was pressed once, then action = [0, 1, 0, 0], and range(np.where(action ==1)[0][0]) is equivalent to range(0,1) , which only contains the value 0. Similarly, in the case of 2 or 3 lever presses, the number of presses considered is 1 too few.

2) Why is the first argument in np.random.choice() range(self.nm-1) ? Shouldn't it be range(2) since we are always sampling from [0,1]? it would also throw an error if self.nm !=3, since p contains only two elements.

Meng-Ling-Ori commented 3 years ago

Actually, the role of 'i' here is only to creat a list: range(i). The number of loops depends then on the number of elements in the list. (e.g. when action = [1,0,0,0], range(i) is empty, which results in no choice; range(1) is [0], which results in one time loop). I know that what I set here is weird, but I don't think it's wrong.

SaschaFroelich commented 3 years ago

Ah yes, you're right.