CogStack / MedCAT

Medical Concept Annotation Tool
Other
448 stars 103 forks source link

CU-8695knfbg Add name edits to regression suite #486

Closed mart-r closed 1 month ago

mart-r commented 1 month ago

Add option (--edit-distance) to specify name edit distance as a tuple (edit distance, random number generator seed, and number of picks).

We need to pick a subset of the edits since otherwise there can be thousands of edits checked - and that would slow down the process considerably.

And to avoid run-to-run variance we need to be able to specify a random seed for the picks.

Examples Example without edit-distance: ``` $ python -m medcat.utils.regression.regression_checker models/20230227__kch_gstt_trained_model_494c3717f637bb89.zip --example-strictness None Loading RegressionChecker from yaml: configs/default_regression_tests.yml Loading model pack from file: models/20230227__kch_gstt_trained_model_494c3717f637bb89.zip Checking the current status 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [02:39<00:00, 1.26it/s] A total of 2 parts were kept track of within the group "default_regression_tests.yml". And a total of 4469 (sub)cases were checked. At the strictness level of Strictness.NORMAL (allowing ['FOUND_ANY_CHILD', 'BIGGER_SPAN_BOTH', 'BIGGER_SPAN_RIGHT', 'BIGGER_SPAN_LEFT', 'PARTIAL_OVERLAP', 'SMALLER_SPAN', 'IDENTICAL', 'FOUND_CHILD_PARTIAL']): The number of total successful (sub) cases: 4289 (95.97%) The number of total failing (sub) cases : 180 ( 4.03%) IDENTICAL : 4234 (94.74%) SMALLER_SPAN : 1 ( 0.02%) FOUND_DIR_PARENT : 27 ( 0.60%) FOUND_ANY_CHILD : 54 ( 1.21%) FOUND_OTHER : 132 ( 2.95%) FAIL : 21 ( 0.47%) Tested 'test-case-1' for a total of 665 cases: IDENTICAL : 649 (97.59%) SMALLER_SPAN : 1 ( 0.15%) FOUND_ANY_CHILD : 7 ( 1.05%) FOUND_OTHER : 2 ( 0.30%) FAIL : 6 ( 0.90%) Tested 'test-case-2' for a total of 3804 cases: IDENTICAL : 3585 (94.24%) FOUND_DIR_PARENT : 27 ( 0.71%) FOUND_ANY_CHILD : 47 ( 1.24%) FOUND_OTHER : 130 ( 3.42%) FAIL : 15 ( 0.39%) ``` Example with edit distance: ``` $ python -m medcat.utils.regression.regression_checker models/20230227__kch_gstt_trained_model_494c3717f637bb89.zip --example-strictness None --edit-distance "(1,42,2)" Loading RegressionChecker from yaml: configs/default_regression_tests.yml Loading model pack from file: models/20230227__kch_gstt_trained_model_494c3717f637bb89.zip Checking the current status 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [06:30<00:00, 1.95s/it] A total of 2 parts were kept track of within the group "default_regression_tests.yml". And a total of 8928 (sub)cases were checked. At the strictness level of Strictness.NORMAL (allowing ['SMALLER_SPAN', 'BIGGER_SPAN_RIGHT', 'FOUND_ANY_CHILD', 'FOUND_CHILD_PARTIAL', 'BIGGER_SPAN_LEFT', 'BIGGER_SPAN_BOTH', 'IDENTICAL', 'PARTIAL_OVERLAP']): The number of total successful (sub) cases: 7419 (83.10%) The number of total failing (sub) cases : 1509 (16.90%) IDENTICAL : 6772 (75.85%) SMALLER_SPAN : 546 ( 6.12%) FOUND_DIR_PARENT : 25 ( 0.28%) FOUND_ANY_CHILD : 88 ( 0.99%) FOUND_CHILD_PARTIAL : 13 ( 0.15%) FOUND_OTHER : 264 ( 2.96%) FAIL : 1220 (13.66%) Tested 'test-case-1' for a total of 1329 cases: IDENTICAL : 976 (73.44%) SMALLER_SPAN : 95 ( 7.15%) FOUND_ANY_CHILD : 11 ( 0.83%) FOUND_CHILD_PARTIAL : 3 ( 0.23%) FOUND_OTHER : 4 ( 0.30%) FAIL : 240 (18.06%) Tested 'test-case-2' for a total of 7599 cases: IDENTICAL : 5796 (76.27%) SMALLER_SPAN : 451 ( 5.93%) FOUND_DIR_PARENT : 25 ( 0.33%) FOUND_ANY_CHILD : 77 ( 1.01%) FOUND_CHILD_PARTIAL : 10 ( 0.13%) FOUND_OTHER : 260 ( 3.42%) FAIL : 980 (12.90%) ``` OR with the 2024-06 Snomed model (only trained on GSTT data). Without edit distance: ``` python -m medcat.utils.regression.regression_checker models/Snomed2024-06-gstt-trained_ae5b08e0fb5310b2.zip --example-strictness None Loading RegressionChecker from yaml: configs/default_regression_tests.yml Loading model pack from file: models/Snomed2024-06-gstt-trained_ae5b08e0fb5310b2.zip Checking the current status 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:47<00:00, 4.21it/s] A total of 2 parts were kept track of within the group "default_regression_tests.yml". And a total of 4680 (sub)cases were checked. At the strictness level of Strictness.NORMAL (allowing ['FOUND_ANY_CHILD', 'FOUND_CHILD_PARTIAL', 'BIGGER_SPAN_RIGHT', 'IDENTICAL', 'BIGGER_SPAN_BOTH', 'BIGGER_SPAN_LEFT', 'PARTIAL_OVERLAP', 'SMALLER_SPAN']): The number of total successful (sub) cases: 4311 (92.12%) The number of total failing (sub) cases : 369 ( 7.88%) IDENTICAL : 4161 (88.91%) SMALLER_SPAN : 2 ( 0.04%) FOUND_DIR_PARENT : 191 ( 4.08%) FOUND_DIR_GRANDPARENT : 18 ( 0.38%) FOUND_ANY_CHILD : 148 ( 3.16%) FOUND_OTHER : 137 ( 2.93%) FAIL : 23 ( 0.49%) Tested 'test-case-1' for a total of 756 cases: IDENTICAL : 730 (96.56%) SMALLER_SPAN : 2 ( 0.26%) FOUND_ANY_CHILD : 5 ( 0.66%) FOUND_OTHER : 18 ( 2.38%) FAIL : 1 ( 0.13%) Tested 'test-case-2' for a total of 3924 cases: IDENTICAL : 3431 (87.44%) FOUND_DIR_PARENT : 191 ( 4.87%) FOUND_DIR_GRANDPARENT : 18 ( 0.46%) FOUND_ANY_CHILD : 143 ( 3.64%) FOUND_OTHER : 119 ( 3.03%) FAIL : 22 ( 0.56%) ``` With edit distance: ``` $ python -m medcat.utils.regression.regression_checker models/Snomed2024-06-gstt-trained_ae5b08e0fb5310b2.zip --example-strictness None --edit-distance "(1,42,2)" Loading RegressionChecker from yaml: configs/default_regression_tests.yml Loading model pack from file: models/Snomed2024-06-gstt-trained_ae5b08e0fb5310b2.zip Checking the current status 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [02:00<00:00, 1.66it/s] A total of 2 parts were kept track of within the group "default_regression_tests.yml". And a total of 9354 (sub)cases were checked. At the strictness level of Strictness.NORMAL (allowing ['PARTIAL_OVERLAP', 'BIGGER_SPAN_LEFT', 'BIGGER_SPAN_BOTH', 'FOUND_CHILD_PARTIAL', 'BIGGER_SPAN_RIGHT', 'SMALLER_SPAN', 'FOUND_ANY_CHILD', 'IDENTICAL']): The number of total successful (sub) cases: 6149 (65.74%) The number of total failing (sub) cases : 3205 (34.26%) IDENTICAL : 5415 (57.89%) SMALLER_SPAN : 564 ( 6.03%) FOUND_DIR_PARENT : 241 ( 2.58%) FOUND_DIR_GRANDPARENT : 20 ( 0.21%) FOUND_ANY_CHILD : 151 ( 1.61%) FOUND_CHILD_PARTIAL : 19 ( 0.20%) FOUND_OTHER : 141 ( 1.51%) FAIL : 2803 (29.97%) Tested 'test-case-1' for a total of 1512 cases: IDENTICAL : 948 (62.70%) SMALLER_SPAN : 94 ( 6.22%) FOUND_CHILD_PARTIAL : 3 ( 0.20%) FOUND_OTHER : 17 ( 1.12%) FAIL : 450 (29.76%) Tested 'test-case-2' for a total of 7842 cases: IDENTICAL : 4467 (56.96%) SMALLER_SPAN : 470 ( 5.99%) FOUND_DIR_PARENT : 241 ( 3.07%) FOUND_DIR_GRANDPARENT : 20 ( 0.26%) FOUND_ANY_CHILD : 151 ( 1.93%) FOUND_CHILD_PARTIAL : 16 ( 0.20%) FOUND_OTHER : 124 ( 1.58%) FAIL : 2353 (30.01%) ```
tomolopolis commented 1 month ago

Task linked: CU-8695knfbg Add option to use edits in regression test suite