CU-8695m5q4x: Fix issues detecting 1-token concepts

The underlying issue presented in models sometimes being unable to recognise a concept where the same model would recognise an incorrectly typed name in the exact same context.

A few more details as to how I came onto this issue

Tested with a few different models: - [1] The 2022/2023 GSTT/KCH trained model - [2] The AU model (where I first saw the issue) - [3] The 2024-06 GSTT-trained model I ran with 2 separate "documents": ``` Patient was diagnosed with diabetes based on previous findings ``` And ``` Patient was diagnosed with diabetis based on previous findings ``` (Note the typo of diabetis instead of diabetes in the 2nd). Some models ([1] and [3]) were able to correctly identify the 2nd (i.e typo'd) version, but not the 1st (i.e correctly typed version). Other models ([2]) didn't identify either.

Turned out the issue was as follows:

The Vocab based NER was checking against 2 name versions for a token
- The normalised token
- The lower case token
The first of the name versions that was in cdb.snames (the subnames) was used going forward
- So the normalised token will be checked first, if it is a subname, it'll be used
- Only if the normalised token was not a subname would the lower case token be checked

This caused the following issue:

When looking at diabetes, the normalised name was diabete
And this name was in the CDB's subnames
As such, it was used as the name of the concept, rather than diabetes itself

This PR provides the following fix:

Checks if either of the name variants are in subnames (cdb.snames) or actual concept names (cdb.name2cuis)
If a name is in the concept names, it will be used
Otherwise the name that was in subnames will be used (if one of them was in subnames)
NOTE:
- Currently preference is on the normalised name
- I.e if both names are a concept name or a subname, the normalised name is used
- But perhaps we should do this the other way around? I don't really know, but that was the preference before so I left it the same.

CogStack / MedCAT

CU-8695m5q4x: Fix issues detecting 1-token concepts #485