Hidden State mapping to two value nodes instead of 1

Hi,

I'm confused on why you've defined the value head as you did in models.py. Namely, the value head as it is will output two numbers instead of 1, since you're mapping from the (2,4096) final hidden state to a (2,1) dimension tensor for the final value. It looks like you're missing half the hidden states. I would expect for it to map from a flattened version of the final hidden state to a single node.

As a sanity check I looked for where this was used and in line 1114 of trainers.py, I noticed that you're only taking in the first value in this (2,1) vector.

Can you tell me why you've made this design choice? I feel like I'm misinterpreting something here.

ContextualAI / HALOs

Hidden State mapping to two value nodes instead of 1 #20