Oufattole / meds-torch

MIT License
11 stars 1 forks source link

Multimodal Tokenization #6

Open Oufattole opened 2 months ago

Oufattole commented 2 months ago

Currently, we only support triplet tokenization, which takes a triple (code, value, time), generates vectors for each of the three and sums those vectors to produce a token. We should add support for

  1. An arbitrary user defined encoder to tokenize a modality
  2. Use of frozen embeddings. I.e. a user can define pre-cached tokens (of arbitrary shape) as inputs.