Oufattole / meds-torch

MIT License
7 stars 0 forks source link

Everything is a Token #8

Open Oufattole opened 1 month ago

Oufattole commented 1 month ago

As opposed to triplet embeddings, we should try an everything is a token approach used in past works : CEHR BERT, ETHOS

For example, imagine a patient has a time series of two observations: a potassium lab in quantile 9/10, and one day later a creatinine lab in quantile 2/10.

Let's support both!

There's a nice figure in the ethos paper of this:

image
Oufattole commented 1 month ago
image

^ These are the 13 time tokens

mmcdermott commented 4 weeks ago

Most interesting to me:

  1. Replicating the strategies used in the literature as something like this strategy is pretty common.
  2. Are numeric values better used as continuous or as categorical modifiers (a related but independent question from this is are values better embedded in a code-specific (e.g., code is "LAB//HR//Q5") or code-independent manner (e.g., sequence is "LAB//HR", "Q5"))
  3. Is a longer, ~per-measurement sequence better than a shorter, per-event sequence?
  4. Is temporal information useful (and if so how)?
    • As a Temporal Position Embedding (TPE) over measurements/event embeddings (this is different from and maybe better or maybe worse than ordinal position embeddings (PEs)).
    • As a time-interval token
    • Not used