Split document based on `input_ids` length and `max_position_embeddings`

Hi there!

Fantastic library 😺

I was wondering if we could add the ability to split documents by max_position_embeddings instead of silently truncating them? Or, failing that, warn the user about the truncation?

On that note, maybe we could also allow for some transformers **kwargs in the model initializations just to accommodate quality of life things such as cache_dir for the model or truncate for the tokenizer.

Obviously, this is just related to the rankers that use Hugging Face.

EDIT: Apologies, in hindsight this should probably be 3-4 separate issues.

AnswerDotAI / rerankers

Split document based on `input_ids` length and `max_position_embeddings` #32