Closed dhimmel closed 2 years ago
Hi @dhimmel, Thank you for your interest in BERN2.
BioSyn
, the neural network normalizer, currently only supports disease and chemical types. Please note that we place an asterisk next to a CUI that has been normalized by 'BioSyn' (e.g., ID: MESH:D013217*).
For the gene/protein type, we are using an off-the-shelf gene type normalizer GNormPlus and the human proteins in your examples are the entities that GNormPlus could not normalize.
If a better gene/protein type normalizer is released in the future, we are planning to replace it with the current gene/protein type normalizer.
Thanks @mjeensung for the clarification. Feel free to post any leads on better gene/protein normalizers here... I'm happy to help evaluate.
Looking at the GNormPlus docs, it does "mention recognition and concept normalization". So are you able to just apply GNormPlus at the concept normalization stage for genes, while using the mention recognition from BERN2? I think the code I'm asking about is:
Here's our off the shelf gene (and other entity) normalizer that's ready for use: https://github.com/indralab/gilda
@dhimmel, that's correct.
For genes, mentions are recognized by the BERN2 NER model (better performance than GNormPlus) and normalized by GNormPlus.
Thank you for recommending this great tool, @cthoyt. We will look into the tool, Gilda, and see if we can incorporate it into BERN2.
Very excited to see BERN2! Really nice work so far.
I'm looking to map certain mentions of proteins to standard identifiers. Here's a list of these proteins, where each protein is also followed by a direction of activity:
Using the nice web interface, I get:
So overall BERN2 does a good job recognizing the protein mentions. However, we actually already know what the protein text is, and are more interested in normalization. Most of the gene/protein mentions receive "ID: CUI-less". Any advice on how to improve the performance of named entity normalization for human proteins?
I see that the website notes that normalization is done by https://github.com/dmis-lab/BioSyn, so feel free to migrate this issue to that repo if it's best there.