ZhiGroup / Med-BERT

Med-BERT, contextualized embedding model for structured EHR data
Apache License 2.0
244 stars 62 forks source link

How exactly do the embeddings work? #8

Closed shamoons closed 2 years ago

shamoons commented 2 years ago

I was trying to decipher from the paper how the embeddings work. Are the ICD9/10 codes literally one hot encoded? Then the Serialized embedding is somehow added? And the visit embedding, is it literally the visit number x dimension?

Thanks so much for your work on this and your responses to the community. It's been very helpful.

lrasmy commented 2 years ago

Hi Shamoons,

Thank you.

When you initially train the model, The token embedding layer, is commonly randomly initialized. Think of it as a 2D matrix, with the rows represents the medical codes, and the columns are the embedding dimensions. When trained, this embedding matrix incorporate lower dimension representations of the codes( not a one hot).

Similar to the token (code) embedding, the positional and visit embedding are defined. So you are correct, the visit embedding, is literally the visit number x embedding dimension.

Those are the static embedding layers, while the contextualized embedding is the output from Med-BERT, which you can use as an input of any classification head, such as our RNN used in the paper.

Hopefully that answer your question.