The current weakest link in the model is the PFC PTp (PT prediction) and VSPatch layer that they drive, which is the key pathway for learning when a US outcome is going to happen (and discounting the dopamine signal.
It is critical that the PTp neurons exhibit a significant amount of systematic temporal evolution over trials, so that their activity pattern is effectively a "clock" that enables VSPatch to learn the expected timing of US outcomes. This clock should be sensitive to any factors that predict US outcome timing, not just time itself but any other indication of proximity to the US, etc. In the BOA model, this includes distance and effort.
If PTp activity is too similar across trials, then VSPatch will falsely expect a US outcome and signal DA dips prematurely -- this is the RewPred_NR (non-reward) stat in the model. These DA dips cause the model to "give up" on the current goal and experience "disappointment" -- not good.
Typically, for predictive learning, we don't want to have the "answer" as an input to the layer (e.g., get distance as an input and then have it try to predict distance). However, in the boa model, we don't really have anything else of relevance to provide as an input, and it doesn't really work well at all to have it try to predict without that input. So, I just went ahead and added the inputs and have them drive it strongly, and now PTp is exhibiting nicely time-varying dynamics.
There are likely still issues with the VSPatch learning mechanism to make it more robust and self-adapting, but it is currently possible to adjust the PVLV.Thr and Gain params to filter out the NR predictions reasonably.
The current weakest link in the model is the PFC PTp (PT prediction) and VSPatch layer that they drive, which is the key pathway for learning when a US outcome is going to happen (and discounting the dopamine signal.
It is critical that the PTp neurons exhibit a significant amount of systematic temporal evolution over trials, so that their activity pattern is effectively a "clock" that enables VSPatch to learn the expected timing of US outcomes. This clock should be sensitive to any factors that predict US outcome timing, not just time itself but any other indication of proximity to the US, etc. In the BOA model, this includes distance and effort.
If PTp activity is too similar across trials, then VSPatch will falsely expect a US outcome and signal DA dips prematurely -- this is the
RewPred_NR
(non-reward) stat in the model. These DA dips cause the model to "give up" on the current goal and experience "disappointment" -- not good.Typically, for predictive learning, we don't want to have the "answer" as an input to the layer (e.g., get distance as an input and then have it try to predict distance). However, in the boa model, we don't really have anything else of relevance to provide as an input, and it doesn't really work well at all to have it try to predict without that input. So, I just went ahead and added the inputs and have them drive it strongly, and now PTp is exhibiting nicely time-varying dynamics.
There are likely still issues with the VSPatch learning mechanism to make it more robust and self-adapting, but it is currently possible to adjust the PVLV.Thr and Gain params to filter out the NR predictions reasonably.