Enterprise-CMCS / eAPD

CMS (Centers for Medicare and Medicaid Services) eAPD - Modernizing the APD experience
https://eapd.cms.gov
Other
58 stars 25 forks source link

Understand how to keep track of changes #1693

Closed mgwalker closed 5 years ago

mgwalker commented 5 years ago

This is just a sub-component of #1692, so think of it more as a note-keeping issue to keep the main issue from getting cluttered.

Automatic saves

Should the app automatically save? If so, when?

Initial proposals

Tracking changes - backend

As changes are made, how should the backend determine when to create a new version? How does it track what changed between versions?

Initial proposals

Tracking changes - frontend

Haven't thought about this one much yet...

mgwalker commented 5 years ago

Replace the APD relational model with a document model

We have 19 tables to handle the various parts of the APD, largely because of the many-to-one nature of the various APD properties (e.g., many contractors to one activity, many activities to one APD, etc.). However, we never address those various properties independently - we only fetch the entire APD and we only update the entire APD (an oversimplification, but effectively true). As a result, our data model is more complex than we actually need - and that comes with a development and maintenance cost.

Adding the ability to comment on individual sections of an APD would add more complexity. We would either need distinct comment tables for each section, or a single comments table that all of the others can reference. We would also need to update all of the section tables to include at least one reference into their associated comment table, and possibly multiple references (e.g., if we want field-level commenting, we would need one reference per field). The data relationship gets more complex. Repeat the entire process in order to be able to assign sections to users for their review.

Finally, tracking changes over time in a relational data model is often done with "history" tables - structurally equivalent to the live tables, but where "old" values are written before the "new" values are put into the live table. There's also the option of a single "changes" table where you record the old value along with what table and column it came from. Both of these changes are cumbersome, and while they are technically capturing changes over time, they do not do so in a way that makes it easy to build an APD from a past state.

All of these things are true because we're using a relational data model to describe the APD. A relational model is ideal when you have multiple types of things that are related together, and you want to be able to interact with and changes those various things independently. Because we treat the entire APD as a single unit, the relational model doesn't make sense.

The primary alternative is the document model (not to be confused with the document object model; thanks for the unique names, computer science!). In this approach, a "document" is treated as a single entity and is fetched and manipulated as a whole. Fortunately for us, PostgreSQL - the database engine we're using - has built-in document support in the form of JSON. (And anecdotally, it appears to perform as well or better than many of the databases that only support document models.)

What it gets us
Alternative

In any case, supporting comments and assignment will require work on the data model. If we choose to stick with the relational model, that will mean creating more tables, building the relationships between them, updating our models in code to describe the right relationships, creating new models in code to describe the comments (etc.), and I'm not even sure how we'd write the endpoints for adding comments (maybe something like POST /comment/apd/:apdID/activity/:activityID/contractorResource/:contractorResourceID; it's very explicit, but quite gnarly and would put a pretty significant burden on the frontend to build proper URLs).

Level of effort

Switching to a document model is not a quick effort. There are a lot of moving pieces to consider. Here's my initial estimate of how the work might be sequenced:


@jeromeleecms @NAretakis I'll plan on getting some time during the working session to pitch to the team and/or answer any questions so you can decide which route you'd prefer (or if you want us to pause on this altogether for now while we settle the roadmap).

hbillings commented 5 years ago

@mgwalker Thank you so much for such an excellent write-up! I think this switch might also mean a couple other things:

mgwalker commented 5 years ago

I wonder if we undertake this, should we also harmonize the frontend and backend document model? Right now we have serialization/apd.js and serialization/activity.js in the frontend to convert back and forth between the two models. The document model generated by the API is based on the relational model backing it up. E.g., the API document model has this:

{
  "keyPersonnel": [{
    "name": "Person's name",
    "costs": [
      { "year": 2019, "cost": 123 },
      { "year": 2020, "cost": 456 }
    ]
   }]
}

The corresponding frontend model is this:

{
  "keyPersonnel": [{
    "name": "Person's name",
    "costs": {
      "2019": 123,
      "2020": 456
    }
   }]
}

Having the same model on both sides would make computing changes easier, and also make it much easier to ship changes from one user to another, which would be pretty slick.

jeromeleecms commented 5 years ago

Tracking some of our discussion before I forget that I ALREADY asked this:

1) Does moving to this model affect our ability to perform analytics? 2) Does this model affect our ability to separately pull / merge separate activities? 3) I'm not seeing a lot of cons in the write up -- is there something we should be aware of?

mgwalker commented 5 years ago
  1. It shouldn't affect our ability to do analytics on existing data, and could end up creating new data for analytics. It might change how we to do the analytics (instead of a typical SQL query, you might need a JSON query, or a SQL query that feeds JSON into R or Matlab or something like that).

  2. No. Currently we address a specific activity by its ID in the activities table. Under this model, we'd address a specific activity by its JSON Pointer path. We could still copy those around between APDs. As far as merging... I'm not sure what that means, but it sounds hard either way. 😆

  3. I think the biggest con is the risk of losing data somewhere along the way as we do the migration. Especially for pilot states that are entering their real APDs right now - we don't want them to have to do that all over. We can mitigate some of the risk by moving in small steps and doing lots of tests after each one to make sure everything is still on track.

jeromeleecms commented 5 years ago

For (2) I was thinking that merging/pulling would be similar to have a 2 APDs that laid out 10 activities each and eventually merging them into a single APD Update (pull in activities 1,3,4 from one and 5,8,9). This is not super common now, but since we are being more activity specific, I'm thinking that it may be something people want in the future -- i.e. not having to update two separate APDs when the other activities don't need anything.

Also, from a operational/oversight perspective, I might want to know: what are all the activities we've approved for State A? Which ones are currently active? etc.

mgwalker commented 5 years ago

Second question first: we can definitely do that!

The first one has lots of implications. I wouldn't feel comfortable saying we could or couldn't do that without a more in-depth conversation about what all that would involve. My instinct is that copying activities from 2 APDs into a single APD would probably be pretty straightforward with the document model and complex with the relational model, but there's a lot of user interaction and expectation to consider, I think.

mgwalker commented 5 years ago

Closing because we're going to proceed with the document model, and that's how we expect to track changes.