CogStack / MedCAT

Medical Concept Annotation Tool
Other
454 stars 105 forks source link

CU-86964zm4d fix preprocessing #496

Closed mart-r closed 3 weeks ago

mart-r commented 1 month ago

Since #469 there's been a bit of an issue with preprocessing. Namely, it wasn't ignoring UKClinicalRefsetsRF2 as it should have done.

I have made the necessary change.

The other small QoL change this PR introduces is the OPCS4 refset ID. Since end of 2023 the refset ID for OCS4 mappings is different. But the default had been the old ID. This PR reversed that logic and has the new refset ID as a default rather than the old one and changes to the old one if/when needed.

And the third thing that was problematic was checking of extension (e.g UK/UK Drug) outside the loop since extensions are now handled automatically and can change while iterating over them in a bundle.

To ensure this all works exactly as it should I ran a test on Snomed UK Clinical Edition and Drug Extension releases with both the old version (i.e one available with medcat 1.12) and this version. And the results were identical in terms of:

tomolopolis commented 1 month ago

Task linked: CU-86964zm4d Fix preprocessing issues