Multi-label classification of clinical text

EmilyAlsentzer / clinicalBERT

repository for Publicly Available Clinical BERT Embeddings

MIT License

658 stars 133 forks source link

If I'm understanding your question correctly, it sounds like this issue is occurring because the labels of your downstream task are biased. I think this question is out of scope for this repo, and I would recommend you ask it on a website like Cross Validated.

If you're concerned about bias in the data sources for the language model, remember that the clinicalBERT models are trained on either all notes in MIMIC or only discharge summaries. The models trained on all notes in MIMIC will likely be biased towards Nursing/other and Radiology notes since those are most frequent (see the table in the Appendix in the clinicalBERT paper for distributions of note types).

Feel free to reopen if you think that this question is more directly related to the clinicalBERT models.

EmilyAlsentzer / clinicalBERT

Multi-label classification of clinical text #21