How exactly do the embeddings work?

Hi Shamoons,

Thank you.

When you initially train the model, The token embedding layer, is commonly randomly initialized. Think of it as a 2D matrix, with the rows represents the medical codes, and the columns are the embedding dimensions. When trained, this embedding matrix incorporate lower dimension representations of the codes( not a one hot).

Similar to the token (code) embedding, the positional and visit embedding are defined. So you are correct, the visit embedding, is literally the visit number x embedding dimension.

Those are the static embedding layers, while the contextualized embedding is the output from Med-BERT, which you can use as an input of any classification head, such as our RNN used in the paper.

Hopefully that answer your question.

ZhiGroup / Med-BERT

How exactly do the embeddings work? #8