Closed paynejd closed 2 years ago
FYI Rafal is working on the CIEL import now and we should be able to do it via the new bulk import API, which means we will soon be able to run these large imports on our own!
On Tue, Mar 24, 2020 at 8:41 AM Ellen Ball notifications@github.com wrote:
FYI @Karuhanga https://github.com/Karuhanga
The PIHEMR dictionary has been cleaned up and in the OpenMRS dropbox. When we are ready to try again, I can recreate the latest version. There are some minor changes.
β You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/OpenConceptLab/ocl_issues/issues/45#issuecomment-603215578, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJCOOJ46QERQOQFEIEAFMTRJCS7ZANCNFSM4ES4UAYQ .
Has been validated against OpenMRS schema and can be imported. @bmamlin has been doing some importing on Staging. If this work is done, the ticket can be closed.
Moving to the next sprint since there are discrepancies remaining
The PIH dictionary was imported onto staging a couple months ago. This uncovered some issues with custom validation schema (#732), an issue with encoding of URLs for codes (#807), and the inability to delete a source if any mappings exist from other sources.
PIH-temp was imported a couple of weeks ago using the OpenMRS Custom Validation. After this import completed and an export of HEAD was downloaded, a few differences were discovered:
from_concept_url
isn't populated (maybe because there was an error importing concept 3779?).@bmamlin The concept 3779 failed to import because of locales -- https://github.com/OpenConceptLab/ocl_issues/issues/732#issuecomment-886247072
I will check the mappings.
@bmamlin The concept 3779 failed to import because of locales -- #732 (comment)
Thanks @snyaggarwal. That makes sense. I'd like to confirm (with Andy & Jon) that this name uniqueness constraint should apply across retired concepts (e.g., if I want to retire a concept and replace it with an improved concept with the same name, I'd be forced to rename the old retired concept before I could add the new one).
@bmamlin I think the issue is that the concept "Dextrose 5% (PIH:7896)" is retired but the names are not voided. I guess this is correct behavior but OCL doesn't like it. It conflicts with the 'en' synonym of "Dextrose 5% (CIEL:161250)". There could be data for the retired concept in various deployments so I cannot delete the retired one.
I can write liquibase to migrate the data, update any forms that used the old concept, then delete the old concept. Let me know if that would be helpful.
@snyaggarwal I confirmed we want to ignore retired concepts (and ignore retired concept names if OCL allows for names to be retired) when checking for duplicates.
@bmamlin ok that make sense. I will update the validator.
@snyaggarwal did we figure out the issue with the 6 mappings that didn't go in? Is OCL preventing multiple mappings between the same two concepts?
@lnball my hunch is "Time Units" (PIH 6412) is an oddball concept that is both coded (with answers) and has concept set members (even though it's not a concept set based on its datatype).
I have cleaned up PIH:6412 to remove the sets. CIEL gets away with many shinanigans (see screenshot):
Let me know if you figure out the rest. I can generate a new file for import whenever you are ready. @bmamlin @snyaggarwal
@bmamlin Yes for OpenMRS schema, there can be only one mapping between two concepts.
@lnball go ahead and generate a new file and we'll give it a whirl. π
@lnball I began adapting the sql_to_json
script to use the gold mapping (a SAME-AS mapping to a numeric PIH code) and discovered several cases where there is more than one gold mapping for a concept. For example, from the latest version you sent there are 16 cases where concepts have two PIH "gold" IDs:
id | gold mappings |
---|---|
6 | 5622, 6408 |
7 | 9721, 1915 |
461 | 1074, 2161 |
462 | 2160, 1073 |
758 | 1900, 1907 |
1121 | 2475, 1118 |
2041 | 7789, 161283 |
2807 | 2763, 7644 |
2935 | 1459, 1897 |
2951 | 1500, 1191 |
2952 | 1190, 1499 |
3159 | 2074, 2097 |
3271 | 3519, 987 |
3366 | 1546, 2181 |
3665 | 4042, 4043 |
4789 | 2858, 2853 |
I was thinking we could use these "gold mappings" to define the ID of the concept within OCL; however, that doesn't work if there are multiple gold mappings. I understand why there are some PIH SAME-AS mappings to non-numeric codes (e.g., so modules, forms, and reports can use friendly references like "PIH:ASTHMA"); however, I'm unclear why there would be multiple numeric codes assigned for the same concept. Does this represent an artifact from concepts that were retired/changed over time? Would it be possible to constrain gold mappings (i.e., a SAME-AS mapping to a PIH numeric code) to one per concept?
For the record, @lnball discovered some of the duplicate "gold mappings" exist in CIEL, so sent a list to @askanter for fixing in CIEL as well:
@bmamlin If it wasn't clear on the emails with Andy, he added PIH mappings to CIEL concepts many years ago. Many of those PIH mappings are out-of-date, wrong, etc. When I use mds to bring concepts from CIEL, those mapping come for free :(
Because it's quick, I'll generate a new mysqldump of the PIH concepts today.
But I think you'd like the numeric PIH mappings on all PIH concepts so you can use them for concept ids? If that's true, I will add them even where there's a CIEL mapping. This will take a bit longer to write the script but especially since it would require regenerating all 24 mds packages...
If it's helpful to do this right, I'm game.
Just to add my comment from the email here. CIEL maintains multiple maps between concepts when the source concepts change. For example, AMPATH might have changed ids over time for WEIGHT. There could be three different concept IDs in the AMPATH dictionary (two being retired) that have instance data attached to them. If we were migrating that dictionary to CIEL, we'd want to know that all of that instance data should be assigned to the CIEL concept for WEIGHT.
So, I was hoping that we could keep all the maps, but perhaps identify one of them as the preferred map for the IDs you are creating. That would be the one that is currently active in the PIH dictionary. The mappings on the CIEL concepts should eventually be updated to indicate preference for outgoing maps (from CIEL to preferred PIH or AMPATH map)
@askanter What do you think about using a mapping relationship for the old ones? "Associated with"? Or a new relationship/name?
I will do so review. I know SNOMED has to do something for retired maps. But knowing which are retired in CIEL will be an update process for sure.
@bmamlin what is the specific requirements for a "gold" mapping? Would it be better to generate some sort of new mapping at import time instead of jumping through hoops making sure the PIH mappings are unique and on all concepts (And fair number of our concepts don't have numeric PIH mappings as all)? I'm conscious of @lnball spending hours needing to manually add mappings and clean up duplicates.
What is the underlying structure/requirements in OCL that we need to fulfill?
@bmamlin what is the specific requirements for a "gold" mapping? Would it be better to generate some sort of new mapping at import time instead of jumping through hoops making sure the PIH mappings are unique and on all concepts (And fair number of our concepts don't have numeric PIH mappings as all)? I'm conscious of @lnball spending hours needing to manually add mappings and clean up duplicates.
What is the underlying structure/requirements in OCL that we need to fulfill?
The problem is the SQL dumps from Ellen have non-official concept IDs, so when I imported the PIH concepts, they end up in OCL with those random concept IDs β e.g., "PIH:3" is "officially" the Asthma concept, but the concept in the SQL dump with concept_id
3 is a concept for a complex obs attachment and the Asthma concept is something like 730 with a SAME-AS mapping to PIH:3.
So, I thought we could use these SAME-AS mapping to PIH numeric codes (i.e., anything matching /^[0-9]+$/
, meaning any whole number) as "gold mappings" to identify the "official" concept ID within the PIH organization.
Unfortunately, that has failed for a few reasons:
While I think the approach of creating numeric (and non-numeric) SAME-AS mappings within an organization to be able to find the right concepts across servers (where internal concept IDs might vary) is a best practice that we should encourage across all orgs, the whole notion of using these as "gold mappings" (using them to identify the "official" concept ID) isn't going to work. That means that Ellen would need to give me a SQL dump where the concept IDs were the actual "official" concept IDs for PIH β i.e., a SQL dump from your "gold" concept server.
Fundamentally, I can't import the PIH dictionary without a non-ambiguously knowing the concept ID of each concept. The simplest solution (from my end) would be to use a SQL dump where concept IDs are "correct" (e.g., concept 3 is the Asthma concept). If that's hard for Ellen/you to generate, then I'm happy to meet in the middle with a scripted solution.
Maybe we should talk at the next OCL call (can you remind me when it is :).
Since we started using Metadata Sharing years ago, PIH concepts no longer have "official" concept_ids, beyond the uuids, so it's not possible to "non-ambiguously know the concept ID of each concept", and we shouldn't rely on anything in the "concept_id" column to be "official".. When we originally created the PIH numeric mappings, we used the concept_id as an easy way to generate a mapping, but there is no longer any expectation that this would stay in sync with the concept_id column in the OpenMRS DB.
If the use case is that OCL needs a primary key, can't we just generate that as part of the import script?
If the idea is that it's best practice that that primary key in OCL is also a mapping on the concept, perhaps we could create a new concept reference source like "PIH_OCL_CODE" and create a script to assign a code from that source to all concepts in the PIH EMR, and then you could use that code as the primary key upon input?
Sorry if I'm missing something... happy to talk more next week.
Well, @lnball provided a snapshot with gold mappings (a SAME-AS mapping to a numeric PIH code) for every concept. Initially, I thought I could simply swap in the gold mapping code as the concept IDs as I created the import file. But, as it turns out, every reference to the internal ID needs to be changed to the gold mapping (including Q&A or concept set references from other concepts and within the gold mapping's from_concept_url
).
So, I tweaked the ocl_omrs
script to gather all gold mappings and then substitute them, when appropriate, for all concept IDs and any mapping references to the org's concept IDs. I uploaded an initial pass of this onto staging as PIH-temp
and, while the counts aren't an exact match (there are still a few issues to work out), it should be good enough for testing out with the subscription module.
Currently blocked by #1018.
I'm going to work on scripting the creation of the PIH-temp collection for now.
Cool, definitely ping me @bmamlin when that is ready...
@mogoodrich, you should now see a PIH-temp
collection in the dictionary manager here. See if you can load it into an instance of OpenMRS using the OCL subscription module.
NOTE: This is a temporary version of the PIH dictionary on staging just to test out functionality (e.g., PIH workflows, using the subscription module, etc.). There may be a few stray concepts missing. When we get to a point where PIH wants to migrate their real dictionary into OCL, we'll do a fresh import from scratch & full validation at that time.
Also FYI β we're using PIH-temp
for now until #661 is addressed (ability to delete a source in OCL that is referenced elsewhere). I've kept PIH-temp
private to ensure nobody creates other sources/collections and references it. You're a member of PIH in OCL, so you should be able to see it. But, if it needs to be public for the OCL subscription module to work, we can make it public.
Thanks @bmamlin ! Will let you know when I get a chance to test.
@bmamlin and @mogoodrich - I was able to see this collection in OCL staging. I created a new empty SDK instance, with just the bare minimum platform + openconceptlab module. I created a new version of the collection in OCL staging, and copied it's subscription URL into the openconceptlab module, along with my API key. I got a bunch of authentication errors in my logs and things failed initially when I tried to import, but I re-copied things and tried again, and although I got a similar error, the import started.
ERROR - HttpMethodDirector.authenticate(236) |2021-10-12 14:26:26,620| Credentials cannot be used for basic authentication: org.openmrs.module.openconceptlab.client.OclClient$OclTokenCredentials org.apache.commons.httpclient.auth.InvalidCredentialsException: Credentials cannot be used for basic authentication: org.openmrs.module.openconceptlab.client.OclClient$OclTokenCredentials at org.apache.commons.httpclient.auth.BasicScheme.authenticate(BasicScheme.java:196) at org.apache.commons.httpclient.HttpMethodDirector.authenticateHost(HttpMethodDirector.java:282) at org.apache.commons.httpclient.HttpMethodDirector.authenticate(HttpMethodDirector.java:234) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.openmrs.module.openconceptlab.client.OclClient.executeGetMethod(OclClient.java:427) at org.openmrs.module.openconceptlab.client.OclClient.executeExportRequest(OclClient.java:135) at org.openmrs.module.openconceptlab.client.OclClient.fetchOclConcepts(OclClient.java:129) at org.openmrs.module.openconceptlab.importer.Importer$2.run(Importer.java:143) at org.openmrs.module.openconceptlab.importer.Importer.runAndHandleErrors(Importer.java:206) at org.openmrs.module.openconceptlab.importer.Importer.runTask(Importer.java:129) at org.openmrs.module.openconceptlab.importer.Importer$1.run(Importer.java:122)
The import finished with errors. I have no idea what the errors were, because the UI in the OCL module doesn't provide any information about that. So I just tried running the import again, and this time it completed and did not list any errors. You can see that it took about 6 minutes to do the import, and you can see the different attempts and the feedback I got here:
I tried comparing against an existing dictionary that was set up using our normal methods (MDS), but this was difficult. I used a diff tool and found a lot of differences, but I'll need to analyze this more to see what the issues are. Some of the ones I can see are pervasive are:
Sort Weight on Concept Answers is not consistently populated, and frequently does not match the expected value. So either this isn't getting imported correctly into OCL, isn't getting exported correctly from OCL, or isn't getting imported correctly in the subscription module
The subscription module creates new concept sources if needed, and has no way to match up against existing concept sources if they are named slightly differently. For whatever reason, in our existing system we have "SNOMED CT" whereas in OCL this is "SNOMED-CT", so these are all getting associated with a different source. Also all of the expected "PIH" mappings are imported as "PIH-temp" mappings, which makes the diffs hard to use to find actual differences.
You imported a version of our dictionary that is not up-to-date with our latest dictionary, so there are inconsistencies there.
I'm not entirely sure what we want to be testing at this stage, but wanted to get this information / feedback reported.
2 other quick things:
This seems like a problematic issue.
Just posted this on Slack: In OpenMRS, when you associate answer concepts with a question, you can control the order via a "sortWeight" parameter (which can be used, for instance, to control the order of answers in a dropdown). It looks like in OCL question-and-answer relationships are handled using a "Q-AND-A" mapping, and I don't see any sort weight or ordering ability on mappings?
This might be a blocker for wider use.
So, addressing the sort_weight issue, looks like the current solution in the Dictionary manager is to use the "extras" field to support this.
@bmamlin looks like we will need to modify the importer to set the sort_weight on the "extras" parameter for mappings of type "Q-AND-A" and "CONCEPT-SET"
So I tried a quick test importing the package into a dictionary that already had the full PIH EMR dictionary installed... i was a little worried because of the duplicate "Yes" concepts that @mseaton pointed out above.
Before the import, I manually changed the name of "PIH" source in my target database to "PIH-temp".
Some observations:
@bmamlin let me know if it would make any sense for @mogoodrich or I to get up to speed on the process you have been using to load (or reload) the PIH dictionary into OCL staging, and whether or not either of us could take this if we end up needing to iterate on this. It seems like the next step is to tweak the scripts you are using to ensure that sort weight is populated in the extras property for Q-AND-A and CONCEPT-SET concept mappings based on @mogoodrich comments.
In the OCL for OpenMRS UI, it shows that none of our Concepts are from CIEL. I find that odd, since we have CIEL mappings on most of our Concepts, and use the CIEL uuid directly on many of them.
So it's probably good to be clear about what that count is. The count is intended to be not the number of concepts mapped to CIEL concepts but the number of concepts imported from the CIEL source. Since the collection probably doesn't have any concepts imported from the CIEL source, this is what I'd expect to see. (What counts as a CIEL concept may be a point to have further discussions around).
The OCL collection I imported says it has 5856 concepts in it (per above image). After importing into a clean DB, my database has 5858 concepts in it. I assume those 2 concepts are "YES" and "NO". Looking in my DB and searching on concept_name = 'YES', I do indeed see 2 concepts:
Do you have a openconceptlab.validationType
global property set to, e.g., "NONE"? Otherwise, I would've assumed that validation would throw an error when trying to import the "Yes" concept from OCL.
I should probe further, but I'm guessing this is due to the fact that the locale_preferred flag is not set on the incoming concepts?
It should be if they are imported if it's in the underlying OCL data, since that's done here.
Since the collection probably doesn't have any concepts imported from the CIEL source, this is what I'd expect to see
That makes sense, but given that this is an initial import of our dictionary to OCL, and not an intentional operation performed within OCL (where we could clearly distinguish a concept imported from the CIEL source vs. not), then I'd think we want to be more clever about this. If we are importing a Concept, and it matches an existing Concept from the CIEL source exactly, in all of the ways that matter, should the import tool be smart enough to say that this is imported from the CIEL source?
Do you have a
openconceptlab.validationType
global property set to, e.g., "NONE"? Otherwise, I would've assumed that validation would throw an error when trying to import the "Yes" concept from OCL.
No, I didn't change this and in my DB I see:
openconceptlab.validationType,FULL
I should probe further, but I'm guessing this is due to the fact that the locale_preferred flag is not set on the incoming concepts?
It should be if they are imported if it's in the underlying OCL data, since that's done here.
Thanks @ibacher ... I'll dig deeper on my end in the coming days.
@bmamlin looks like we will need to modify the importer to set the sort_weight on the "extras" parameter for mappings of type "Q-AND-A" and "CONCEPT-SET"
I made this change here and regenerated the import JSON for PIH-temp, which includes sort weights. I tried doing a bulk import of one mapping as a test and it didn't add the sort weight as an extra as expected, so I'm working through that now.
Thanks for the update @bmamlin ... on my end, I spent some time yesterday digging deeper into the issues I mentioned above, and am making progress, will give an update when I'm done, but I think I've confirmed that the potential issue I reported with local_preferred names is not and issue... both the duplicates I found were actually duplicates in the base PIH EMR dictionary (and one had been voided)... so at least that much is "not a bug", I believe.
Another update: I think I've identified all the anomalies I commented about above, and have a (non-bug) explanation for that.
Next step would be to fix the sort order (both on import into OCL, and in the OCL for OpenMRS import into OpenMRS) and test again...
Thanks all!
It turns out there's a bug for bulk import that doesn't update mappings as expected unless the import is done in parallel mode. @snyaggarwal pointed this out and I was able to import all answer & set member mappings to PIH-temp
on staging, so OCL has sort_weight
for all PIH answers & set members for PIH-temp
now.
FYI, we identified that there is a change set somewhere in the migration scripts that also assigned concept_answer sort_weight when it was null. This probably means there is a ton of data out there that needs to be fixed in the non-PIH worldβ¦
On Oct 21, 2021, at 12:20 PM, Burke Mamlin @.***> wrote:
It turns out there's a bug for bulk import that doesn't update mappings as expected unless the import is done in parallel mode. @snyaggarwal https://github.com/snyaggarwal pointed this out and I was able to import all answer & set member mappings to PIH-temp on staging, so OCL has sort_weight for all PIH answers & set members for PIH-temp now.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenConceptLab/ocl_issues/issues/45#issuecomment-948841752, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFZI3FQJ627JQXG6GDVTFA3UIBDVXANCNFSM4ES4UAYQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
just to confirm @bmamlin , this means that from your perspective, the sort_weights have now been added and we are ready to test again? (though from the OCL module side we will need to make sure https://issues.openmrs.org/browse/OCLOMRS-1067 is implemented first before testing)
from your perspective, the sort_weights have now been added and we are ready to test again?
Yes. For example, here's the "Ambulatory" answer for "Type of patient":
Thanks @bmamlin !
FYI β I created an OpenMRS Talk Post for general discussion around getting PIH using OCL. We can continue to use this GitHub issue for working through and reporting progress for the specific technical steps of getting the PIH dictionary into OCL and use the Talk post for discussion that doesn't directly apply to this initial technical step.
@bmamlin so I'm seeing the new sort_weight paramter through the UI as you demostrated:
Then I created a new version of the Collection and downloaded the JSON, but I'm not seeing the sort_weight reflected in the JSON... am I missing something or creating a new version incorrectly? The date stamp appears to be when the association was initially created, not updated?
(fyi, I'm on vacation next week so won't be responding then)
FYI @Karuhanga
The PIHEMR dictionary has been cleaned up and in the OpenMRS dropbox. When we are ready to try again, I can recreate the latest version. There are some minor changes.