Everything is Text Tokenization - Githubissues

Oufattole / meds-torch

MIT License

11 stars 1 forks source link

Everything is Text Tokenization #12

Closed Oufattole closed 2 months ago

Oufattole commented 2 months ago

We can convert timestamp, code name, and value into text description, and then use BERT or some LM to embed this into a vector. This can allow the method to generalize to any EHR (fingers crossed).

So we are going with three version of this

[x] code_text: code text is fed to a language model and converted to a token which we then sum with the time and numerical value vectors.
[x] observation_text: Convert the triplet (time, code, numerical_value) into text, which we use a language model to get a vector for.
[x] #17