eonum / medcodelearn

Playground for machine learning with medical coding data and medical classifications
7 stars 6 forks source link

Use pretrained Embeddings #16

Closed tschimbr closed 8 years ago

tschimbr commented 8 years ago

See https://github.com/fchollet/keras/issues/853

tschimbr commented 8 years ago

Use all tokens for pretraining. Only use codable code (sum of all vectors of its tokens) for supervised training.

anujgupta82 commented 8 years ago

@tschimbr

Can u elaborate what do u mean by "codable code"

tschimbr commented 8 years ago

This is about the application's domain: medical coding. A codable or final code in the ICD (International Classification of Diseases) is a code that has no further leaf nodes or specializations. By using only the final codes we can break down the sequence lengths that are used as inputs for the neural net without loosing too much information. Unless you are also working with medical codings this trick does not help you much. The idea is to use domain knowledge to break down input size and the model's complexity.

tschimbr commented 8 years ago

See PR #19