How to correctly provide padding tokens to forward pass of pretrained model?

Hi there, thanks for this repo and the pretrained models.

I have a question on batching sequences of varying length. I've found the padding token and tokenizer to work effectively, but I see no input of an attention mask to the forward pass of the model.

I've tried passing a padded sequence, e.g. padded with 4s as output by the tokenizer, and a non-padded sequence. The resulting embeddings of at least the last few tokens are very different between these two examples.

The common pattern is to also provide an attention mask. I try to pass this like model(input_ids, attn_mask=attn_mask) but it this isn't how it's set up. I looked through the source code and can't find the way that an attention mask mechanism would work in it.

Is there a way to batch sequences of varying length and how should I do this?

HazyResearch / hyena-dna

How to correctly provide padding tokens to forward pass of pretrained model? #36