Add gpt2 converter, hellaswag eval tool, misc fixes

Experimenting in #32 , I gathered that it might be good to support the official gpt2 baseline, so here it is.

Some notes:

gpt2 is not 100% reproducing the official huggingface implementation, most probably because of slight numerical differences between nn.Linear (ours) and nn.Conv1D (huggingface)
~~addition of a "TRANSPOSE" mechanism in convert_HF (again, Linear vs Conv1D)~~
the hellaswag tool is a bit janky
addition of "Learned" position_encoding, some additional factorization around this might be good
⚠️ modification of the default left_padding behaviour (might still be improved)

@funboarder13920 @l-k-11235 this will conflict with #26 and potential future work there

eole-nlp / eole