n_positions vs n_ctx inconsistency

The two separate variables were an artifact of the GPT-2 implementation that this was adapted from. In theory they are different (n_positions is the amount of positions that have a positional embedding while n_ctx is the maximum sequence that can be fed into the coarse transformer). There are some applications where you could use the differences to generate sequences that are longer than n_ctx by feeding the most recent n_ctx - 1 elements back into the model while continuing to iterate the position indices. However, our experiments don't make use of these capabilities so the variables could be combined (likely dropping n_positions and replacing the few uses of that with n_ctx).

btheodorou99 / HALO_Inpatient

n_positions vs n_ctx inconsistency #19