ANNUBS / annubes

ANNUBeS: training Artificial Neural Networks to Uncover Behavioral Strategies in neuroscience
https://annubs.github.io/annubes/
Apache License 2.0
2 stars 0 forks source link

Explore the training procedure and eventually define the edits needed #19

Closed gcroci2 closed 1 month ago

gcroci2 commented 8 months ago

As a reference for the code, see PR #16

Data in each epoch

trials = task.generate_trials() is always the same across one network's training (rng is fixed once for each network), being the entire training set. It's used thousands of times (epochs) to train the net. Are we fine with that? If yes, there is no point in generating the same trials for each epoch every time.

What do we think it makes more sense to do here?

Mini Batch vs Batch Gradient Descent

Validation

Any particular reason why validation starts after epoch 200? I think we should properly implement some early stopping callback.


Useful notes Batch size: defines the number of samples to work through before updating the internal model parameters. Number of epochs: defines the number of times that the learning algorithm will work through the entire training dataset. Batch, Mini Batch & Stochastic Gradient Descent

This issue blocks #15

We decided to start with the reinforcement part first, in particular by trying to mimic this article procedure: https://elifesciences.org/articles/21492

gcroci2 commented 4 months ago

Useful links

Rationale

A major goal in neuroscience is to understand the relationship between an animal’s behavior and how this is encoded in the brain. Typical experiment: training an animal to perform a task and recording the activity of its neurons while the animal carries out the task.

To complement these experimental results, researchers “train” artificial neural networks to simulate the same tasks on a computer. Unlike real brains, artificial neural networks provide complete access to the “neural circuits” responsible for a behavior, offering a way to study and manipulate the behavior in the circuit.

You can use:

Reward-based training of RNNs

Song et al.'s networks consisted of two parts:

Other info:

The environment $\epsilon$ represents the experimentalist, while the agent $A$ represents the animal. At each time t the agent chooses to perform actions after observing inputs provided by the environment, and the probability of choosing actions is given by the agent’s policy $\pi_{\theta}$ with parameters $\theta$. Here the policy is implemented as the output of an RNN, so that $\theta$ comprises the connection weights, biases, and initial state of the decision network.

In this work they only consider cases where the agent chooses one out of $Na$ possible actions at each time, so that $\pi{\theta} (at | u{1:t})$ for each t is a discrete, normalized probability distribution over the possible actions $a1, … , a{N_a}$.

After each set of actions by the agent at time t the environment provides a reward (or special observable) $\varrho_{t+1}$ at time t+1, which the agent attempts to maximize.

gcroci2 commented 1 month ago

We've decided to implement the functionalities originally planned for the annubes package within the NeuroGym package instead, which represents the current state-of-the-art.