drbenvincent / darc_toolbox

Run adaptive decision making experiments
MIT License
16 stars 2 forks source link

Implement more elaborate reward distributions? #32

Open drbenvincent opened 5 years ago

drbenvincent commented 5 years ago

Because of historical reasons, we have been dealing with design spaces where each individual prospect is relatively simple.

For non-risky prospects then we have a certain outcome

Probability reward delay
1 RB DB

For risky prospects we have only two outcomes

Probability reward delay
PB RB DB
1-PB 0 0

The problem is that the 'secondary' reward is assumed to be zero. This assumption is built into the design generation code and the modelling code. So close, but the current code is not able to do these kinds of composite gambles more generally.

I did not have the foresight to implement the most general solution, which would be a full reward distribution table where there are N outcomes:

Probability reward delay
P[0] R[0] D[0]
P[1] R[1] D[1]
P[2] R[2] D[2]
... ... ...
P[N] R[N] D[N]

Reasons to do this

  1. It would allow you to do Holt & Laury (2002) style gambles. This exact paper shows a fixed design, but you could apply Bayesian Adaptive Design to a larger design space if you wanted.

E.g. Screen Shot 2019-08-28 at 14 56 34

What would it take (early thoughts)

  1. At the moment the design space is a pandas table. Each column is a design dimension and each row is a particular design. If we were doing distributions then the table of designs would need to be something like
design ProspectA ProspectB
0 distribution distribution
1 distribution distribution
2 distribution distribution
... ... ...

where distribution is a reward distribution table (or object) like above

  1. You'd then need to change how models interacted with the designs.
  2. You'd have to think a bit to make sure all the models make sense with reward distributions and ensure that this is a well defined general operation

This amounts to something quite simple. Rather than each element in the design table being a scale, it simply becomes an array. Such as:

design RA DA PA RB DB PB
0 list list list list list list
1 list list list list list list
2 list list list list list list
... ... ... ... ... ... ...

So basically each attribute (eg reward or delay) is going to be represented by a row vector. Where we have many designs, it will be a 2D matrix of size [number of alternatives, number of design].

This is pointing away from representing designs using a Pandas DataFrame and towards simple numpy arrays (RA, DA, PA, ...) which could be attributes of a design class (to allow for dot indexing). This could avoid some of the faff I've experienced getting stuff into and out of DataFrames.

random initial test img_2982

UPDATE (28th August 2019)

I know how to do this now. If we do not opt for full reward distributions and instead opt for prospects of the form:

P% chance of R1 in D1 days or 1-P% chance of R2 in D2 days

then we can do this in a really very simple way. All you need to do is to add more design variables. So for our 2 choice tasks this means we would have the following design variables: 1.PA1 the chance of getting reward 1 for choice A

  1. RA1 reward 1 for choice A
  2. DA1 delay 1 for choice A
  3. RA2 reward 2 for choice A, which happens with probability 1-PA1
  4. DA2 delay 2 for choice A
  5. PB1 the chance of getting reward 1 for choice B
  6. RB1 reward 1 for choice B
  7. DB1 delay 1 for choice B
  8. RB2 reward 2 for choice B, which happens with probability 1-PB1
  9. DB2 delay 2 for choice B

Having 10 design variables is quite a lot. But the point is we are not actually optimising over all these 10 dimensions. For risky choice for example, then all delays are going to be zero. So the design variables would be: 1.PA1 the chance of getting reward 1 for choice A

  1. RA1 reward 1 for choice A
  2. RA2 reward 2 for choice A, which happens with probability 1-PA1
  3. PB1 the chance of getting reward 1 for choice B
  4. RB1 reward 1 for choice B
  5. RB2 reward 2 for choice B, which happens with probability 1-PB1
  6. DA1 always equal to 1
  7. DA2 always equal to 1
  8. DB1 always equal to 1
  9. DB2 always equal to 1

So just 6 (free) design variables. We could also add various constraints in the code which generates the full data frame of designs. Examples would include constraining RA1 > RA2 and RB1 > RB2 just to cut down on the total number of designs (rows).

The Holt & Laury gambles have even more constraints. So that would look like: 1.PA1 the chance of getting reward 1 for choice A [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]

  1. RA1 reward 1 for choice A (always $2)
  2. RA2 reward 2 for choice A, which happens with probability 1-PA1 (always $1.60)
  3. PB1 the chance of getting reward 1 for choice B [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
  4. RB1 reward 1 for choice B (always $3.85)
  5. RB2 reward 2 for choice B, which happens with probability 1-PB1 (always $0.1)
  6. DA1 always equal to 1
  7. DA2 always equal to 1
  8. DB1 always equal to 1
  9. DB2 always equal to 1

The other main change would be that we would have to create models which could deal with these design spaces. Specifically, we will have to update the equations calculating the present subjective utility. This might mean we get some clashes if a user builds a design space with composite gambles but uses it with a model which only deals with simple gambles. So we can deal with this either by 1) creating new models for the composite gambles or 2) by altering the models so they are sensitive to the form of the design space. Of these, the former probably seems best.

References

Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92(5), 1644–1655. http://doi.org/10.1257/000282802762024700