I was wondering if we could add the ability to split documents by max_position_embeddings instead of silently truncating them? Or, failing that, warn the user about the truncation?
On that note, maybe we could also allow for some transformers **kwargs in the model initializations just to accommodate quality of life things such as cache_dir for the model or truncate for the tokenizer.
Obviously, this is just related to the rankers that use Hugging Face.
EDIT: Apologies, in hindsight this should probably be 3-4 separate issues.
Hi there!
Fantastic library 😺
I was wondering if we could add the ability to split documents by
max_position_embeddings
instead of silently truncating them? Or, failing that, warn the user about the truncation?On that note, maybe we could also allow for some
transformers
**kwargs in the model initializations just to accommodate quality of life things such ascache_dir
for the model ortruncate
for the tokenizer.Obviously, this is just related to the rankers that use Hugging Face.
EDIT: Apologies, in hindsight this should probably be 3-4 separate issues.