btheodorou99 / HALO_Inpatient

21 stars 7 forks source link

n_positions vs n_ctx inconsistency #19

Open juancq opened 1 month ago

juancq commented 1 month ago

Thanks for providing the code. It made replication very smooth.

I find the use of n_positions vs n_ctx inconsistent and I am wondering whether this is a bug. What was the motivation to have these two variables? Can we drop one of these variables and just use either one?

Having one variable would make it consistent with the GPT-2 implementation.

btheodorou-codecertain commented 1 month ago

The two separate variables were an artifact of the GPT-2 implementation that this was adapted from. In theory they are different (n_positions is the amount of positions that have a positional embedding while n_ctx is the maximum sequence that can be fed into the coarse transformer). There are some applications where you could use the differences to generate sequences that are longer than n_ctx by feeding the most recent n_ctx - 1 elements back into the model while continuing to iterate the position indices. However, our experiments don't make use of these capabilities so the variables could be combined (likely dropping n_positions and replacing the few uses of that with n_ctx).