Understand how to keep track of changes

This is just a sub-component of #1692, so think of it more as a note-keeping issue to keep the main issue from getting cluttered.

Automatic saves

Should the app automatically save? If so, when?

Initial proposals

When the user adds a new list item or removes a list item (e.g., adds an activity, removes a contractor resource, etc.)

This one is useful because it will mean that saying "list item #3" means the same thing for both the local APD and the APD on the server. That would allow us to track differences based on items' position in their list (which should be valid, because array ordering is consistent).
After some amount of time with no edits. E.g., if the user hasn't changed any content for 5 minutes, just save that sucker.
Definitely don't want to save on every keypress (yet, anyway... :smiling_imp:); I'm thinking specifically of the case where a user replaces a bunch of text and then decides they want to undo all that and keep the original text. If we haven't saved yet, they can just reload the page.
- ~What if we do a save every time a field loses focuses, but we only save that single field?~ Never mind, that doesn't help with the previous point. But it might make it a lot easier to keep track of what has changed between versions. We might also be able to keep versions of individual fields, so maybe we could have a way to revert each field to previous versions?

Tracking changes - backend

As changes are made, how should the backend determine when to create a new version? How does it track what changed between versions?

Initial proposals

Drastically overhaul the data model, replacing the current relational APD model with a document model. (E.g., store the entire APD as a single JSONB blob.)
- Images in rich text areas would probably need to be extracted into separate image uploads instead of being embedded as base64 data URLs, just to make the JSON blob smaller - this would be a good thing to do anyway.
- This would let us address individual fields with JSON Pointers (RFC 6901) instead of entry IDs.
- We could add comments as new properties directly to the document. We would probably want a convention for comments.
- We could still describe the schema of an APD with JSON Schema, which would also allow us to do schema validation (not to be confused with data validation).
Use JSON Patch (RFC 6902) to track diffs
- The saved diff would be the set of actions to take the current APD back to the previous version, not the other way around. The saved complete APD should probably be the current one? (The alternative is that the saved complete APD is the original empty one, and when fetching current, we'd have to apply all the diffs.)
Create a version whenever there's a save by a new user. (E.g., if Alice saved a few minutes ago and now Bob is saving, create a version from the current APD before persisting Bob's save.)
Create a version after 30 minutes (or whatever) with no saves.
A version would consist of the user that made the changes in it, a timestamp of the final change in the version, and a list of diffs that define what changed. (Want a list of diffs because a version might consist of multiple saves. Trying to merge multiple diffs would be nightmarish. Imagine the case where the 2nd contractor resource was deleted, and then the new 2nd contractor resource – previously the 3rd – was edited. What would that merged diff look like?)

Tracking changes - frontend

Haven't thought about this one much yet...

Replace the APD relational model with a document model

We have 19 tables to handle the various parts of the APD, largely because of the many-to-one nature of the various APD properties (e.g., many contractors to one activity, many activities to one APD, etc.). However, we never address those various properties independently - we only fetch the entire APD and we only update the entire APD (an oversimplification, but effectively true). As a result, our data model is more complex than we actually need - and that comes with a development and maintenance cost.

Adding the ability to comment on individual sections of an APD would add more complexity. We would either need distinct comment tables for each section, or a single comments table that all of the others can reference. We would also need to update all of the section tables to include at least one reference into their associated comment table, and possibly multiple references (e.g., if we want field-level commenting, we would need one reference per field). The data relationship gets more complex. Repeat the entire process in order to be able to assign sections to users for their review.

Finally, tracking changes over time in a relational data model is often done with "history" tables - structurally equivalent to the live tables, but where "old" values are written before the "new" values are put into the live table. There's also the option of a single "changes" table where you record the old value along with what table and column it came from. Both of these changes are cumbersome, and while they are technically capturing changes over time, they do not do so in a way that makes it easy to build an APD from a past state.

All of these things are true because we're using a relational data model to describe the APD. A relational model is ideal when you have multiple types of things that are related together, and you want to be able to interact with and changes those various things independently. Because we treat the entire APD as a single unit, the relational model doesn't make sense.

The primary alternative is the document model (not to be confused with the document object model; thanks for the unique names, computer science!). In this approach, a "document" is treated as a single entity and is fetched and manipulated as a whole. Fortunately for us, PostgreSQL - the database engine we're using - has built-in document support in the form of JSON. (And anecdotally, it appears to perform as well or better than many of the databases that only support document models.)

What it gets us

JSON Patch (RFC 6902) is a standard format for describing changes to a JSON document. Switching to a JSON document model would enable us to use this standard for change tracking over time. We could also switch to it for saving changes from the frontend which would be a nice consistency win.

For example, we can update the name of the third contractor in the first activity:
```
{
 "op": "update",
 "path": "#/activities/0/contractor/2/name",
 "value": "New Name"
}
```

Because a document model isn't tied to a database schema (though schemas can still be enforced in code with JSON Schema), it's much more flexible, so we can add APD metadata very easily - such as adding comments to each individal textfield.

{
 "id": 3,
 "name": "MP-2019-07-25-HITECH-APD",
 "activities": [
   {
     "alternatives": "",
     ...

...might become...

{
 "id": 3,
 "name": "MP-2019-07-25-HITECH-APD",
 "activities": [
   {
     "alternatives": {
       "comments": [
         {
           "author": "user id",
           "timestamp": "Tuesday",
           "value": "A comment"
         }
       ],
       "value": ""
     },
     ...

gives us language for highly-specific validation messages from the API Using the same JSON Pointer (RFC 6901) standard as JSON Patch, we can be very explicit about which field(s) are failing validation
```
{
 "path": "#/activities/0/schedule/2/endDate",
 "value": "End date is after the activity has finished"
}
```
(To be clear, switching to a document model neither gives us validation nor does it give us very-specific validation. It just provides us a data format for being specific.)

Alternative

In any case, supporting comments and assignment will require work on the data model. If we choose to stick with the relational model, that will mean creating more tables, building the relationships between them, updating our models in code to describe the right relationships, creating new models in code to describe the comments (etc.), and I'm not even sure how we'd write the endpoints for adding comments (maybe something like POST /comment/apd/:apdID/activity/:activityID/contractorResource/:contractorResourceID; it's very explicit, but quite gnarly and would put a pretty significant burden on the frontend to build proper URLs).

Level of effort

Switching to a document model is not a quick effort. There are a lot of moving pieces to consider. Here's my initial estimate of how the work might be sequenced:

add a data migration to add the document column to the apds table, and convert existing APD data into that document
update the POST /apds and PUT /apds/:apdID endpoints to also update the document; leave existing functionality intact

These two steps would be merged into master as a single pull request. They set the groundwork for everything that's to come after. At this stage, we have two copies of every APD stored in the database - one in the relational model and one in the document model. Everything still uses the relational model as the source of truth.
update the GET /apds and GET /apds/:apdID endpoints to return the APD from the document model

This would be the next pull request. We would want thorough tests before merging to make sure the frontend still works as expected, especially around being able to save changes. At this poit, the document model becomes the source of truth.
add a new PATCH /apds/:apdID endpoint that accepts a JSON Patch object describing how to change the document; only updates the document model, not the relational model

This is setting the stage for the frontend to be able to switch over, but we want to get this in sooner rather than later so we can start testing it and make sure it works as expected. This endpoint should be disabled in production because it could cause data corruption in later steps if it's buggy here.

This endpoint needs to do some light validation of the model (not necessarily the data, though eventually that too) to ensure that arbitrary fields aren't added to the database.
update the frontend to capture changes as JSON Patches in addition to the dirty tracking we have now

This can begin at any time, including in parallel with the first step. We want it in production for a bit before switching to it exclusively, though, so we can test it out and make sure it's consistently generating the correct patch data.
enable the PATCH /apds/:apdID endpoint in production
update the frontend to use the PATCH /apds/:apdID endpoint instead of the PUT endpoint

Everyone's using the document model at this point. Nearly there!
remove the PUT /apds/:apdID endpoint
update the POST /apds endpoint to only create the document model instead of also the relational model
add a data migration to remove the 18 additional APD tables
add a data migration to remove the extraneous id fields in the APD sub-properties

These can all be a single pull request, or a few. In the last step, we basically clean up the document model to remove the remnants of the relational model. We cannot remove these extranous ids until the frontend is updating with patches - prior to that, it relies on the id fields of the sub-properties to do partial APD updates.
recommend a tech debt spike to remove Bookshelf.js ORM and switch the remaining code models to use knex.js directly (Bookshelf.js is a layer on top of knex and adds a fair bit of complexity that we don't need if we're not using relational models)

@jeromeleecms @NAretakis I'll plan on getting some time during the working session to pitch to the team and/or answer any questions so you can decide which route you'd prefer (or if you want us to pause on this altogether for now while we settle the roadmap).

@mgwalker Thank you so much for such an excellent write-up! I think this switch might also mean a couple other things:

a lower barrier to entry for vendors developing new features, because the data model is more flexible and less complex
ability to have multiple users editing fields simultaneously, without losing data (enabled by JSON Patch)
autosave and form validation become easier problems to solve
cleaner update workflow: cloning an APD for update (but leaving behind last year's comments) becomes more feasible

I wonder if we undertake this, should we also harmonize the frontend and backend document model? Right now we have serialization/apd.js and serialization/activity.js in the frontend to convert back and forth between the two models. The document model generated by the API is based on the relational model backing it up. E.g., the API document model has this:

{
  "keyPersonnel": [{
    "name": "Person's name",
    "costs": [
      { "year": 2019, "cost": 123 },
      { "year": 2020, "cost": 456 }
    ]
   }]
}

The corresponding frontend model is this:

{
  "keyPersonnel": [{
    "name": "Person's name",
    "costs": {
      "2019": 123,
      "2020": 456
    }
   }]
}

Having the same model on both sides would make computing changes easier, and also make it much easier to ship changes from one user to another, which would be pretty slick.

Tracking some of our discussion before I forget that I ALREADY asked this:

1) Does moving to this model affect our ability to perform analytics? 2) Does this model affect our ability to separately pull / merge separate activities? 3) I'm not seeing a lot of cons in the write up -- is there something we should be aware of?

It shouldn't affect our ability to do analytics on existing data, and could end up creating new data for analytics. It might change how we to do the analytics (instead of a typical SQL query, you might need a JSON query, or a SQL query that feeds JSON into R or Matlab or something like that).
No. Currently we address a specific activity by its ID in the activities table. Under this model, we'd address a specific activity by its JSON Pointer path. We could still copy those around between APDs. As far as merging... I'm not sure what that means, but it sounds hard either way. 😆
I think the biggest con is the risk of losing data somewhere along the way as we do the migration. Especially for pilot states that are entering their real APDs right now - we don't want them to have to do that all over. We can mitigate some of the risk by moving in small steps and doing lots of tests after each one to make sure everything is still on track.

For (2) I was thinking that merging/pulling would be similar to have a 2 APDs that laid out 10 activities each and eventually merging them into a single APD Update (pull in activities 1,3,4 from one and 5,8,9). This is not super common now, but since we are being more activity specific, I'm thinking that it may be something people want in the future -- i.e. not having to update two separate APDs when the other activities don't need anything.

Also, from a operational/oversight perspective, I might want to know: what are all the activities we've approved for State A? Which ones are currently active? etc.

Second question first: we can definitely do that!

The first one has lots of implications. I wouldn't feel comfortable saying we could or couldn't do that without a more in-depth conversation about what all that would involve. My instinct is that copying activities from 2 APDs into a single APD would probably be pretty straightforward with the document model and complex with the relational model, but there's a lot of user interaction and expectation to consider, I think.

Closing because we're going to proceed with the document model, and that's how we expect to track changes.

Enterprise-CMCS / eAPD