Merging our bb.js fork back to mothership

kachok commented 10 years ago

We released version 1.0 of our Data Reconciliation Engine on May 31st (https://github.com/amida-tech/DRE)

As part of this product we forked and rebuilt bluebutton.js library (also incorporated most of Josh Mandel's code from ccda-to-json).

Here is repo with our fork - https://github.com/amida-tech/blue-button

Now, when there are no deadlines and time pressure, we want to merge our effort back to this project. How should we proceed with that?

sankarravi commented 10 years ago

Congrats @kachok! Looks like some cool stuff here. It's going to be a big effort to merge a complete rewrite back into the main project :).

A few first pass questions:

Have we figured out a strategy for making XPath work in the browser properly? I think we've agreed this library is Node-first, but we still need to make sure we have a plan for making browsers work before we switch to an XPath implementation.
Do you have thoughts on how we can specify multiple possible locations for JSON fields with the XPath implementation? I've only worked with two vendors' real world C32s so far (Greenway and IMS) and both had quirks I had to work around this way:

el = entry.template('2.16.840.1.113883.10.20.1.28').tag('value');
var name = el.attr('displayName'),
    code = el.attr('code'),
    code_system = el.attr('codeSystem'),
    code_system_name = el.attr('codeSystemName');

// Pre-C32 CCDs put the problem name in this "originalText" field, and some vendors
// continue doing this with their C32, even though it's not technically correct
if (!name) {
  el = entry.template('2.16.840.1.113883.10.20.1.28').tag('originalText');
  name = el.val();
}

  var author = doc.tag('author');
  el = author.tag('assignedPerson').tag('name');
  var name_dict = parseName(el);
  // Sometimes C32s include names that are just like <name>String</name>
  // and we still want to get something out in that case
  if (!name_dict.prefix && !name_dict.given.length && !name_dict.family) {
    name_dict.family = el.val();
  }

I think we need support for this "check in the regulation spot first, and then try this if that fails" technique. (I expect real-world CCDAs to have some similar issues, but even if we get crazy lucky and everyone creates high quality CCDAs, Elation's deployment of Bluebutton.js so far cares way more about C32s than CCDAs, because we're not seeing any CCDAs in the wild yet, but lots of clinicians have C32s they want to get data out of it.)

We'll need to decide on a new library-level API that integrates your changes and the generation we've added to the main project (maybe a function like bb.generateXml or maybe something closer to our existing API, not sure yet)

Once we've addressed those questions, I think the rough process would be:

Migrate your new testing infrastructure to the main project. If we replace the tests wholesale while also changing the code, it will be really hard to figure out what kinds of API changes your changes introduce. I will make it a priority to expand our existing jasmine-node test suite to cover one of the NIH and one of the EMERGE samples, so we can get a broader sample of how our current library performs (since the HL7 example is missing a lot of data and isn't quite a valid CCDA).
Copy your CCDA parsing code over and see what kind of breaking changes we've introduced. Fix anything where we're parsing less data than before or where the API change is neutral.
Restore C32 functionality, using the new XPath; test for regressions there. I added a C32 test to the main project, but most of the flawed C32 files I have also have PHI in them... I'll see if I can beef up our samples there too.

@blacktm let me know what you think of that rough plan and @kachok whether that sounds like a meaningful path forward

kachok commented 10 years ago

@sankarravi Yep, it is a huge lift to "merge" it back. We have improvements on generating CCDA coming in and I think we could roll C32 support as well.

Biggest question we have is feedback on changed JSON data model for patient data. Any feedback/opinion on that? Here are our docs - https://github.com/amida-tech/blue-button/blob/master/docs/model.md

sankarravi commented 10 years ago

Ah, I hadn't looked at that yet. I do think we should attempt to preserve the current API (or a superset of it) wherever possible, and only introduce changes when it seems like they are clear improvements (rather than just introducing changes because it's convenient to keep Josh's CCDA-to-JSON code).

So one example of change that seems beneficial: breaking given[] into first and middle[] (a teammate who was trying to interact with BB JSON was very confused about how to get a first name out). Example changes I'm less sure about: changing problem's date_range: { start: ..., end: ... } to just date: [] and changing phone: { work: ..., home: ..., mobile: ... } to phones: [{ number: ..., type: ...}]

sankarravi commented 10 years ago

Ok @kachok I just expanded the test suite to include more sample CCDAs / exercise a larger range of the library's functions, so I think we should have a pretty good basis for judging regression now.

sankarravi commented 10 years ago

@kachok Anything I can do to make this project more realistic / help it along?

kachok commented 10 years ago

@sankarravi hey, we are back to the idea of merging now that our library is stable. Any ideas how we should proceed? Biggest question is difference in data models (ours is follows CCDA somewhat closely).

Otherwise here is what we have right now over at https://github.com/amida-tech/blue-button

CCDA, C32 (VA), CDA (older version) parsing
CMS Blue Button format (text based) parsing
Sensing all of the above formats + some more like VA ascii BB text file
JSON Schema for data model and validator
CCDA generation from our JSON model

We don't support browser (only node.js/server side) but it is feasible to implement this support back. And we are still using libxmljs.

sankarravi commented 10 years ago

I think we generally ought to move away from rather than toward the CCDA model :). The spec is certainly a mess, and I think we should strive to make our JSON as comprehensible as possible without knowledge of spec.

The current implementation has a long way to go to meet that standard, but the implication is that I wouldn't want to change the models just to get closer to the CCDA spec.

That said, I would love to get your work merged back into master, if for no other reason than that it'd be nice for the health of the project to have more contributors using mainline in prod.

One big question I have re: the xpath/libxmljs approach: do you have existing support for / good ideas for supporting checking multiple places for a given piece of JSON? I'm finding that to be incredibly important for real-world usage of this tool, and the DOM-like implementation makes it really easy. See, e.g., https://github.com/blue-button/bluebutton.js/blob/master/lib/parsers/c32/medications.js#L48 where we check three different places for medication names. It's a sufficiently big deal to me that Elation would have to stay off mainline if we ever adopted a mainline strategy that made it difficult to do.

Beyond that question, I think the path forward is to run the unit tests to see if there are differences beyond field names. Once those are sorted out, we can build a list of model disagreements and try to agree on the most useful model possible. I'm certainly happy to change our BB code for any model changes that we think make the library legitimately more useful / it can give us a good chance to think about the usability of the JSON.

kachok commented 10 years ago

I agree on making model comprehensible, but also disagree on having direct relationship to CCDA. I think there should be 1-1 relationships to CCDA as most used format to convert data to/from (since we believe that Blue Button adoption/MU2 rules/overall market goes toward CCDA as exchange format for patient health data).

With all of these in mind I don't think we are far off model wise. Here is link to our full data model http://developers.amida-tech.com/document_model.html in JSON Schema v4 with examples for most sections. I think our biggest difference is handling of dates and codes.

We have following structure for dates (includes point/ranges/etc and date+precision):

            "date_time": {
                "point": {
                    "date": "2000-03-23T14:30:00Z",
                    "precision": "minute"
                }
            }

or

        "date_time": {
            "low": {
                "date": "2008-01-03T00:00:00Z",
                "precision": "day"
            },
            "high": {
                "date": "2008-01-03T00:00:00Z",
                "precision": "day"
            }
        }

and for coded entries we always wrap them so it is easier to deal with them (and include translation):

        // in this case it wrapped in "product" element, it usually whatever name that makes sense in the context
        "product": {
            "name": "Proventil HFA",
            "code": "219483",
            "code_system_name": "RXNORM",
            "translations": [
                {
                    "name": "Proventil 0.09 MG/ACTUAT inhalant solution",
                    "code": "573621",
                    "code_system_name": "RXNORM"
                }
            ]
        }

We can't run test you have on our lib, mostly because our data models are different, so we need to reconcile dates/codes handling first and then we can merge the rest. It will be very usefull to actually know which sections people use downstream (e.g. I think some are more important then others for ElationEMR, Prime and Amida to name few current users of the library).

re: the xpath/libxmljs approach this approach is pretty flexible, we have "cleanup" steps that can take data from variety of elements and push it in proper element depending on format/values present/etc... so extracting medication name from one of three possible objects will work same way (I think we have exactly the same logic in meds for either CCDA or C32 parser).

Also I recently played with browser support (using browserify), and figured that we can bring it back almost without much pain.

library structure I strongly feel that it will be of great help to split bb.js into multiple libraries modules, e.g.:

data model with validation (e.g. JSON Schema based) so everything can be measured against this as "JSON BB specification"
CCDA/C32/etc XML-HL7 based data formats parsers
other parsers (1 library per parser), e.g. CMS Blue Button text file parser
CCDA generation out of JSON (other generators can be implemented later if needed as separate modules)

With data model as specification with independent set of test cases, everyone can build libraries against it and we have more independence in terms of parsers/generators. Anyone can build one for their specific needs/quirks as long as we all agree on common data model.

sankarravi commented 10 years ago

Data Model I think using the CCDA's rough structure is a good idea, like matching our sections to their sections and having the same number of list items in those sections as their list. However, I definitely don't think we should be bound by the naming choices or field decision of the CCDA. The format is a mess to understand, and we should smooth over its rough edges over time.

One example I feel strongly about: it is totally absurd to have an immunization record that actually records a lack of immunization administration just by sticking in a negationInd="true" flag. I think it's a much nicer user experience to interact with separate immunization and immunization_decline sections then to have to check a "this is opposite land" flag, so it's hard to make a mistake. A second example: the given name array, instead of "first" vs "middle"; the latter would be a lot nicer.

But yes, you're right that in practice the existing library is not too far from CCDA, so there's not a huge difference to work through yet. Eventually we'll have to sort out where to move the library.

I haven't looked through your whole schema yet, but re: the things you pulled out:

I think the date format seems reasonable, except that I think it's a very bad idea to represent day-level precision like "2008-01-03T00:00:00Z," because if you naively consume it, it gets converted to local time by the JS engine, and it's very easy to end up outputting that as "01/02/2008," depending on timezone. This is solved by outputting day-level precision dates as pure date strings, like "01/03/2008" (see https://github.com/blue-button/bluebutton.js/commit/65236a2d37efaf42453c8409afc71283eb8549dd). It's a solution that requires much less vigilance. After making that change, I found no need to distinguish between more than dates vs. date-times, but if you think minute vs. second vs ms precision is useful, I'm ok with storing dates with a separate precision field.
Wrapping coded entries seems great to me.

I will try to do a more careful audit soon, but yeah, these sound like resolvable things.

XPath Cool, will check out your meds code to get an idea what that looks like. I'll probably also need to make sure the test suite covers all of these edge cases... sometimes it's hard to commit a test to prove the change because my samples contain PHI, and I haven't bothered to make PHI-free copies. But I'll try to go back and do that, since a rewrite is high regression risk, obviously.

Library structure Also sounds reasonable to me

kachok commented 10 years ago

So how should we proceed?

Should we do conference call (invite all past and current contributors) sometime next week, define path forward (specifically on data model) and vote? I'm happy to organize, compile agenda and set conf bridge, etc.

blacktm commented 10 years ago

@kachok Thinking about our conversation last week with Matt, I'm going to reboot the Blue Button open source community generally – I'll send an email out later today with details. We'll be much more efficient and effective if we do some basic re-org and streamline our communications to drive consensus and results on stuff like this.

If you want to start outlining an agenda, that would be great. The JSON model is a good first order of business, I think. After I send out this note to everyone, we'll determine our contributors and set up a time for a call.

kachok commented 10 years ago

:+1:

thetylerhayes commented 9 years ago

@kachok You all ever get on a call?

Just re-visiting this because funny enough I just ran into the same issue you mentioned: C32 files using an originalText child element to contain the name of a medication, for example, rather than displayName attribute on the code tag:

// Pre-C32 CCDs put the problem name in this "originalText" field, and some vendors
// continue doing this with their C32, even though it's not technically correct
if (!name) {
  el = entry.template('2.16.840.1.113883.10.20.1.28').tag('originalText');
  name = el.val();
}

sankarravi commented 9 years ago

Going to close this but happy to reopen it if a conference call comes back in the plans @kachok ?

blue-button / bluebutton.js

Merging our bb.js fork back to mothership #105