Convert history to a big string (use same approach as for observation_text and then concatenate all of those strings)
load tokenizer (could be from pretrained huggingface model) and tokenize into integers
Allow loading of pretrained language models and generation of a representation for downstream tasks (maybe we can do mamba mamba-130m-hf, Masked imputation model: bert, and autoregressive transformer: microsoft/phi-1_5.
We should add some caching support later, maybe just using safetensors with a dictionary from event ID to the tensor.
Pipeline:
observation_text
and then concatenate all of those strings)mamba-130m-hf
, Masked imputation model: bert, and autoregressive transformer:microsoft/phi-1_5
. We should add some caching support later, maybe just using safetensors with a dictionary from event ID to the tensor.