Closed bmamlin closed 2 years ago
OpenMRS custom validation schema was applied to CIEL on staging (task id ba510612-1319-4d6a-8a43-abbd80b0ecee
on default queue) and is running. @snyaggarwal will let me know when it's done and report any errors.
@bmamlin Please find attached errors from OpenMRS validation schema change attempt data.json.zip
I'm analyzing the results. Here's a quick breakdown:
Validation error | n |
---|---|
All names except short names must be unique for a concept and locale | 13 |
Concept fully specified name must be unique for same source and locale | 248 |
Concept preferred name must be unique for same source and locale | 1 |
The single "Concept preferred name must be unique for same source and locale" appears to come from Ova (716) and Eggs (edible) (162171). They both have "Eggs" as a synonym. It's preferred for 162171, but not 716.
@snyaggarwal which is this custom validation code enforcing?
I think we want #1 (prevent preferred names to be duplicated within any locale), but the error I'm seeing suggests it might be doing #2 (since "Eggs" is only a preferred name for 162171 and not 716).
@askanter, these CIEL concepts all have duplicate names. In most cases, the fully specified name is also a synonym (in which case I believe you just need to delete the synonym). I've inserted them here as a checklist, so you have an easy way to mark them as completed when you get to them:
Wasn't sure how to use the checklist. x887 x5086 x103251 x103289 x103249 x103250 x103261 x103262 x103248 x103222 x103221 x103252 x103238
@askanter,
Wasn't sure how to use the checklist.
LOL. 😄 I put checkboxes in my post... maybe GitHub didn't let you interact with them. Sorry about that.
I've enumerated the 248 cases where there are FSNs that are not unique in a table below. In each case, the given name is a FSN for one concept and used as a name for some other concept(s) in CIEL. Let me know if it would be easier for you to get these in another form.
2. A local preferred name within a source cannot exactly match any other name within that locale.
@bmamlin Yes its doing 2
- A local preferred name within a source cannot exactly match any other name within that locale.
@bmamlin Yes its doing 2
Ok. Let me confirm with the OpenMRS community, but I believe we will want #1 – i.e., prevent duplicate amongst preferred synonyms within a locale rather than across all synonyms (even non-preferred) in the locale. I'll report back here once I've confirmed with the OpenMRS community.
@snyaggarwal I've confirmed that we should only be enforcing uniqueness of preferred names within a locale amongst preferred names (i.e., allow other concepts to share the same name as long it isn't also marked as preferred within the locale). I created #1070 to fix this part of the OpenMRS custom validation schema.
Burke, I noticed in your report a concept 488 that was retired already in OCL and CIEL. Can you re-run the query making sure the retired concepts are not included? Thanks! Andy
Andrew S. Kanter, MD MPH FACMI FAMIA
Asst. Prof. of Clinical Biomedical Informatics and Clinical Epidemiology Columbia University Email: @. / @. Mobile: +1 (646) 469-2421 Skype: akanter-ippnw Yahoo: andy_kanter
On Thursday, October 28, 2021, 11:01:21 AM CDT, Burke Mamlin ***@***.***> wrote:
@snyaggarwal I've confirmed that we should only be enforcing uniqueness of preferred names within a locale amongst preferred names (i.e., allow other concepts to share the same name as long it isn't also marked as preferred within the locale). I created #1070 to fix this part of the OpenMRS custom validation schema.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
@askanter,
I noticed in your report a concept 488 that was retired already in OCL and CIEL. Can you re-run the query making sure the retired concepts are not included? Thanks!
Here is the list with retired concepts omitted.
@askanter, are you working through these names that duplicate FSNs? If you're making good progress, then that's great. If it's slow going or you haven't been able to work through them, perhaps we should try to come up with a way to address this programmatically (i.e., come up with a rule on how to fix the duplication – even if temporarily – while you work through them). For example, adding " (duplicate)" to every non-FSN duplicate name, so we can continue working toward a fully valid CIEL while you take your time to (maybe over a few versions of CIEL) to whittle down the number of names ending in " (duplicate)".
Slow going, I was focusing on other work for MSF first. Let me see what patterns I am seeing. I think if the dupe is in a non-FSN position it can be retired. If the dupe is in the FSN position (there are examples of this in foreign languages) then I need to reconcile. them. If I can get a list of the duped FSNs (Vietnamese, Russian, etc.) than I can resolve those before the weekend.
@askanter,
Here are the non-FSNs that could be retired.
Let me know if you would like these in a different format.
And the FSN dupes? Those are the ones I thought I could address in CIEL.
And the FSN dupes? Those are the ones I thought I could address in CIEL.
All of these are issues with CIEL, so need to be addressed at the origin (you). I thought these 138 cases of concepts with non-FSN duplicate names were the ones you could summarily retire without having to think about them (so might be low hanging fruit for this release).
Since your original version of CIEL doesn't have UUIDs and we lose the concept_name_id
when we export form OpenMRS into OCL, we can only locate these manually or by concept_id
+ name
+ locale
. As long as the UTF-8 symbols work, something like this SQL should void all of the duplicate non-FSNs:
Also, I've include all the cases of duplicate FSNs below for your reference. Unless you identify a particular pattern, I think you'll have to address these manually.
I resolved the ones above either by retiring the non-FSN or sometime retiring the concept.
All done. You might want to do one more test when you get the release
We'll try re-applying the OpenMRS validation schema to CIEL v2021-12-15 on Staging and see what validation issues remain.
I wouldn't be surprised if there are still some failures. The changes were complicated and overlapping. It might require the next release to catch any that I missed...
@bmamlin below is the result:
{
"task-id":"848bc062-efb1-44dd-abbb-408b790e9b5b",
"state":"SUCCESS",
"result":{
"failed_concept_validations":[
{
"mnemonic":"166968",
"url":"/orgs/CIEL/sources/CIEL/concepts/166968/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: Location of death (locale: en, preferred: yes)"
]
}
},
{
"mnemonic":"110120",
"url":"/orgs/CIEL/sources/CIEL/concepts/110120/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: neoplasia maligna primaria de test\u00edculo (locale: es, preferred: yes)"
]
}
},
{
"mnemonic":"149361",
"url":"/orgs/CIEL/sources/CIEL/concepts/149361/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: U tuy\u1ebfn th\u01b0\u1ee3ng th\u1eadn (locale: vi, preferred: yes)"
]
}
},
{
"mnemonic":"1541",
"url":"/orgs/CIEL/sources/CIEL/concepts/1541/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: Location of death (locale: en, preferred: yes)"
]
}
},
{
"mnemonic":"113715",
"url":"/orgs/CIEL/sources/CIEL/concepts/113715/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: Neoplasia maligna del maxilar superior (locale: es, preferred: yes)"
]
}
},
{
"mnemonic":"113716",
"url":"/orgs/CIEL/sources/CIEL/concepts/113716/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: Neoplasia maligna del maxilar superior (locale: es, preferred: yes)"
]
}
},
{
"mnemonic":"134807",
"url":"/orgs/CIEL/sources/CIEL/concepts/134807/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: Tumeur maligne de l'ovaire (locale: fr, preferred: yes)"
]
}
},
{
"mnemonic":"166813",
"url":"/orgs/CIEL/sources/CIEL/concepts/166813/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: Stage IIA1 (locale: fr, preferred: yes)"
]
}
},
{
"mnemonic":"166814",
"url":"/orgs/CIEL/sources/CIEL/concepts/166814/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: Stage IIA1 (locale: fr, preferred: yes)"
]
}
},
{
"mnemonic":"166835",
"url":"/orgs/CIEL/sources/CIEL/concepts/166835/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: Tr\u00e9panation (locale: fr, preferred: yes)"
]
}
},
{
"mnemonic":"128688",
"url":"/orgs/CIEL/sources/CIEL/concepts/128688/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: neoplasia maligna primaria de test\u00edculo (locale: es, preferred: yes)"
]
}
},
{
"mnemonic":"128845",
"url":"/orgs/CIEL/sources/CIEL/concepts/128845/",
"errors":{
"names":[
"Concept fully specified name must be unique for same source and locale: Tumeur maligne de l'ovaire (locale: fr, preferred: yes)"
]
}
}
]
}
}
Thanks, I think I got all of those. I guess we have to wait until the next release to fix them in OCL.
This fell off my radar. I'll coordinate with @snyaggarwal on Slack to try applying the OpenMRS validation schema on the latest version of CIEL and see where we stand.
@bmamlin the latest list of failures from Staging --
[
{
"mnemonic": "167135",
"url": "/orgs/CIEL/sources/CIEL/concepts/167135/",
"errors": {
"names": [
"Concept fully specified name must be unique for same source and locale: Repetitive game play (locale: en, preferred: yes)"
]
}
},
{
"mnemonic": "167283",
"url": "/orgs/CIEL/sources/CIEL/concepts/167283/",
"errors": {
"names": [
"Concept fully specified name must be unique for same source and locale: Repetitive game play (locale: en, preferred: yes)"
]
}
},
{
"mnemonic": "162356",
"url": "/orgs/CIEL/sources/CIEL/concepts/162356/",
"errors": {
"names": [
"Concept fully specified name must be unique for same source and locale: Goutte (locale: fr, preferred: yes)"
]
}
},
{
"mnemonic": "166211",
"url": "/orgs/CIEL/sources/CIEL/concepts/166211/",
"errors": {
"names": [
"Concept fully specified name must be unique for same source and locale: Réaction à un facteur de stress sévère (locale: fr, preferred: yes)"
]
}
},
{
"mnemonic": "166835",
"url": "/orgs/CIEL/sources/CIEL/concepts/166835/",
"errors": {
"names": [
"Concept fully specified name must be unique for same source and locale: Trépanation (locale: fr, preferred: yes)"
]
}
},
{
"mnemonic": "117762",
"url": "/orgs/CIEL/sources/CIEL/concepts/117762/",
"errors": {
"names": [
"Concept fully specified name must be unique for same source and locale: Goutte (locale: fr, preferred: yes)"
]
}
},
{
"mnemonic": "119574",
"url": "/orgs/CIEL/sources/CIEL/concepts/119574/",
"errors": {
"names": [
"Concept fully specified name must be unique for same source and locale: Délire (locale: fr, preferred: yes)"
]
}
},
{
"mnemonic": "142600",
"url": "/orgs/CIEL/sources/CIEL/concepts/142600/",
"errors": {
"names": [
"Concept fully specified name must be unique for same source and locale: Delusions (locale: en, preferred: yes)"
]
}
},
{
"mnemonic": "162361",
"url": "/orgs/CIEL/sources/CIEL/concepts/162361/",
"errors": {
"names": [
"Concept fully specified name must be unique for same source and locale: Pouce (locale: fr, preferred: yes)"
]
}
},
{
"mnemonic": "149514",
"url": "/orgs/CIEL/sources/CIEL/concepts/149514/",
"errors": {
"names": [
"Concept fully specified name must be unique for same source and locale: Réaction à un facteur de stress sévère (locale: fr, preferred: yes)"
]
}
}
]
@askanter,
In case you don't want to read the JSON above, here's a summary of the remaining issues:
Cases where two concept share the same FSN:
Other cases of duplicate names:
@askanter
FYI – I created a series of SQL queries you could use to check OpenMRS concept validation rules on your dictionary. When I run these against CIEL v2022-05-21:
in
, which was deprecated in 1989 and replaced with id
. These could be fixed with:UPDATE concept_name SET locale = 'id' WHERE locale = 'in';
You may have already addressed most/all of the issues I found in v2022-05-21, so hopefully you'll see fewer now.
@askanter,
I ran the validation queries against CIEL v2022-08-13:
There are still 70 concepts with the deprecated locale in
(should be id
), but we might need to get you working in the 1.11 data model before we can easily fix them.
There are 16 remaining cases of duplicate FSNs
Concept ID | Locale | Name |
---|---|---|
116506 | vi | Bệnh thận |
121555 | vi | Bệnh than |
164996 | en | Clinical pharmacology specialty |
164997 | en | Clinical pharmacology specialty |
144290 | en | Congenital Absence of Vagina |
163890 | en | Congenital absence of vagina |
161421 | fr | électrophorèse de l'hémoglobine |
165544 | fr | Electrophorese de l'hemoglobine |
110801 | es | Fractura no expuesta de la bóveda craneal con laceración y/o contusión cerebral |
120431 | es | fractura no expuesta de la bóveda craneal con laceración Y/O contusión cerebral |
110349 | es | lesión traumática anóxica del cerebro durante un procedimiento y/o como consecuencia de él |
148640 | es | lesión traumática anóxica del cerebro durante un procedimiento Y/O como consecuencia de él |
113709 | es | Neoplasia maligna del sistema nervioso |
135092 | es | neoplasia maligna del sistema nervioso |
110460 | es | quemadura de la mejilla por abrasión y/o fricción, con infección |
150722 | es | quemadura de la mejilla por abrasión Y/O fricción, con infección |
127312 | vi | Rubella viêm não |
127313 | vi | Rubella Viêm não |
165642 | en | State of Eritrea |
165763 | en | State of Eritrea |
125057 | en | Tactile Hypesthesia |
166844 | en | Tactile hypesthesia |
127693 | vi | Thận tiểu đường |
127706 | vi | Thận Tiểu đường |
120703 | vi | Thuyên tăc mỡ |
140510 | vi | Thuyên tắc mỡ |
123964 | en | Tunga Penetrans Infestation |
166567 | en | Tunga penetrans infestation |
114399 | vi | Đau mặt |
131040 | vi | Đau mắt |
110843 | vi | Độc Thu giữ |
127876 | vi | Đọc Thu giữ |
Fixed these (although most were old so not sure how they didn't show before. FYI, your code is not specific enough to detect that the Vietnamese with the diacritical marks are not identical (at least three pairs weren't and I didn't fix those. Does it make sense to redo the release (since I could also add the monkey pox terms)?
your code is not specific enough to detect that the Vietnamese with the diacritical marks are not identical
Interesting. It turns out the default collation being used (utf8mb4
) does accent insensitive comparisons. I updated the validation scripts to explicitly use the utf8mb4_bin
collation, which allows (when combined with the LOWER() function) for case insensitive accent sensitive comparisons. With the changes to the validation scripts, the diacriticals are handled properly and names that differ only by diacriticals – like "Bệnh thận" and "Bệnh than" – are no longer listed as duplicates.
Does it make sense to redo the release (since I could also add the monkey pox terms)?
Unless you have a reason that these terms need to get out in the next week or two, I'd say we use these as a motivation to get you into a RefApp 2.4 instance and exercising the "new" (1.11+) release process.
For the record, the validation scripts showed no remaining validation errors for the CIEL v2022-08-14 import file. 👍 A good sign that we may be close to being able to successfully apply the OpenMRS custom validation schema for CIEL on OCL.
CIEL v2022-08-14 on Staging... is now using OpenMRS custom validation! 🕺
Wahoo!!!
On Tue, Aug 16, 2022 at 11:55 PM Burke Mamlin @.***> wrote:
CIEL v2022-08-14 on Staging... is now using OpenMRS custom validation! 🕺
[image: image] https://user-images.githubusercontent.com/860834/185031217-461ce2f0-a55a-477f-8d8b-ef5088831acd.png
— Reply to this email directly, view it on GitHub https://github.com/OpenConceptLab/ocl_issues/issues/1027#issuecomment-1217432012, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJCOOPRWN7PKVGQVSWBGGDVZRPEVANCNFSM5FPI5ZRQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
It appears the only things preventing CIEL from meeting the OpenMRS custom validation schema requirements on production is the ~70 names using the deprecated Indonesian locale abbreviation "in" (should be "id"). If we add "in" as an "ISO 639-1 Non-preferred" alias for the Indonesian locale on production (like it is on staging), then CIEL could squeak by for now. Otherwise, we'll need to wait for the next CIEL update.
So this is something I could do now? Change in to id? Didn't even realize I had Indonesian content!
CIEL v2022-10-12 released and should be fully valid. Will plan on enabling OpenMRS custom validation schema on production later this week and, assuming it succeeds, we'll finally be able to close this ticket. 🙂
@bmamlin OpenMRS validation schema is successfully applied on Production CIEL
I had to add "in" synonym locale, this was already done in staging.
To date, CIEL has not been using the OpenMRS custom validation schema, which enforces a set of validation rules on concepts. While we expect CIEL already conforms to these rules, it's very possible there are some exceptions that we will discover as we apply the validation schema.
The goals of this ticket are:
Once successfully applied, the OpenMRS custom validation schema will ensure all new content or changes to content within future versions of CIEL continue to conform the the OpenMRS schema.