[spec] eFolder Express v2 API

amprokop commented 6 years ago

Context

V1 of the eFolder API was somewhat heroically developed by @aroltsch and @mdbenjam. We got it out the door, and it works for Reader. But eFolder was originally developed from front to back as a "Download All" button for VBMS. We've since grafted a few things onto it: the ability to download separate documents instead of a ZIP file, the ability to fetch documents from VVA as well as VBMS, and an API.

There are several areas of the API we could do better with:

It was designed for the Reader UI, but is not compatible with the eFolder Express UI.
- We should move to a place where the UI uses as much as the same code as the API as possible, to cut this knot
StaleObjectErrors! They are really confusing and hard to handle. We should isolate their handling as much as possible, instead of adding bunch of tortured retry logic :)
Reduce load on VBMS/VVA by ensuring we're only downloading each set of documents once, not once for each user
Caller arguably has to pass too much info (save_documents=true)?
It violates RESTful principles

Overview

We have been considering a refactor of eFolder Internals, documented here and here. We should couple those goals to development of v2 of the eFolder API, so we can make breaking changes without fear and roll out incrementally behind a simple feature flag.

Goals

Design an API that can be used not just by Reader, but by Certification, Hearings Status, Dispatch, and even the eFolder Express UI if we like
Clear a path so we can later deprecate the eFolder Express UI if we want to.
Pay down technical debt and reduce complexity
Get closer to a truly RESTful, standards-based API
Pave a path to moving the Reader documents data into eFolder, where it probably belongs in the long term

eFolder v2 API routes

GET /api/v2/documents_list (?force_sync=true|false) -H “File Number: XXXXXX”

Returns a JSON list of documents associated with a file number.

More details

without the force_sync parameter set to true:
- starts async job to fetch VBMS manifest if the manifest is expired
- starts async job to fetch VVA manifest if the manifest is expired
- will return whatever documents list we have
- exposes date & time VBMS, VVA documents list was fetched at
with the force_sync parameter set to true:
- starts sync job to fetch VBMS manifest
- starts sync job to fetch VVA manifest
- waits until those calls return to respond
- even on error, responds with the document list we currently have (with some notification that there was an error??)

Use cases

The Reader documents list page could consume this endpoint without passing the force_sync parameter
Hearings Prep?
Dispatch?

POST /api/v2/fetch_documents -H “File Number: XXXXXX”

Starts fetching and saving all the documents from our document sources. Returns an ID of the fetch attempt so the caller can check on its status later.

More details

starts asynchronous job to fetch VBMS manifest
starts asynchronous job to fetch VVA manifest
starts asynchronous job to save all documents in S3 in separate files
starts asynchronous job to save a ZIP file in S3 of everything
returns id of fetch attempt

Use cases

eFolder UI
Reader can have a “Refresh Documents” button, if we like?
RetrieveDocumentsForReaderJob hits this endpoint

GET /api/v2/fetch_documents_status/:id

Returns the status of our attempt to fetch documents from document sources, save them to S3, and make them available via our API.

More details

returns status of VBMS fetch attempt
returns status of VVA fetch attempt
returns status of job to save all documents in S3 in separate files
returns status of job to save a ZIP file in S3 of everything

Use cases

eFolder UI
Reader’s “Refresh Documents” button?

GET /api/v2/documents/:id -H “File Number: XXXXXX”

Returns a document file.

More details

returns 404 if document not found
does not trigger any jobs

PUT /api/v2/documents/:id -H “File Number: XXXXXX”

Updates some parameters of a document.

More details

TBD—whatever parameters we would like to update. Parameters we can update in VBMS can be updated in VBMS. Annotations? Comments? Tags? Document type?

Use cases

Reader, obviously

GET /api/v2/documents -H “File Number: XXXXXX”

Returns ZIP of all documents.

More details

returns 404 if we don’t have the zip
does not trigger any jobs

Use cases

eX UI
Reader "Download all"

Associated Work

https://github.com/department-of-veterans-affairs/caseflow-efolder/issues/670

Open Questions

[spec needed] How do we move Reader comments/annotations out of Caseflow and into the eFolder DB?
[spec needed] Can we reduce load on the rails apps by fetching documents/ZIPs from S3 directly or through an nginx proxy?

mdbenjam commented 6 years ago

I think there are a lot of good things with this API design. One of the central problems I dealt with when designing the first one is a tension between REST and revealing implementation details. As a consumer of the API, I don't really want to have to think about starting remote jobs, or the order in which I call the API. Ideally, I want to be able to hit one endpoint for the list of documents and another to download the documents. Of course this has lead us to do things in a non-RESTful manner by modifying the DB on GET requests. Our justification of this approach was that since it's just caching files, it's not really THAT un-RESTful, the route is still idempotent.

With this new approach, which I think has a lot of merits, we put more burden on consumers of the API to understand how to call it. And calling the API becomes more verbose. Now that might be reasonable tradeoff, but I think we should evaluate it as such.

Some specific thoughts on the routes.

For GET /api/v2/documents_list (?force_sync=true|false) -H “File Number: XXXXXX” do we really need the force_sync parameter? I thought you wanted to get away from doing things synchronously. Also since this is a GET that launches jobs, it's not RESTful

For POST /api/v2/fetch_documents -H “File Number: XXXXXX” I think there should be a flag for whether to start the zip file job. This is an expensive thing to run, and something we don't always need.

For GET /api/v2/fetch_documents_status/:id could this be named GET /api/v2/fetch_documents/:id? Do we need another path specifically for status? Not sure about the standards here on polling endpoints.

For PUT /api/v2/documents/:id -H “File Number: XXXXXX” We don't need to pass the file number. The doc id is sufficient. We should also get the doc from VBMS even if it's not already cached in S3.

PUT /api/v2/documents/:id -H “File Number: XXXXXX” is more of a long term goal. We'll probably need to migrate to v2 of the VBMS api before we can modify things in VBMS. But still will be useful then.

sunil-sadasivan commented 6 years ago

Thanks for spec-ing this out @amprokop! This feels like a solid refactor and proposal.

I'd be curious to learn more about the consumers of this API and their requirements.

So far we have today:

UI
Reader
RetrieveDocumentsForReaderJob

I'd recommend consulting with @Chingujo on future plans / iteration of the UI to understand how we see eFolder evolving and if that brings considerations to shape the direction of the v2 API.

Some specific thoughts more on the plan:

I'm always hesitant for refactors of existing and heavily relied upon services. I'd imagine a key challenge here might be to refactor models to accomplish a cleaner API feel and codebase while ensuring no regressions to v1. We can perhaps discuss the merits of a refactor within the existing codebase or taking key pieces of caseflow-efolder and starting a new API service to accomplish this.
This is probably obvious, but if not... Once we have an API design pattern that works well for us, I'd recommend breaking this down into a implementation plan with deployable and validating phases.

Chingujo commented 6 years ago

Thanks @sunil-sadasivan! and the API looks exciting in making eFolder express flexible and more integratabtle across Caseflow. Currently specing out a long term vision for eFolder express. In terms of shorter term UI changes, I don't see drastic changes unless in response to production bugs, simple UX improvements, and content changes (amid some open questions still left to be answered).

Planning on a sync with @mdbenjam on Monday to see how all this fits in to ex's future =D

anyakhvost commented 6 years ago

@amprokop Thank you for writing this up. Some thoughts and questions on the endpoints:

GET /api/v2/documents_list (?force_sync=true|false) -H “File Number: XXXXXX”

I do see the point of having force_sync flag, however, we will be exposing implementation details to the api consumer. What can we do to avoid the flag? Maybe we can have two different endpoints?
I know we are trying to be truly RESTful but I don’t think it is going to be possible and that’s okay (https://www.mobomo.com/2010/04/rest-isnt-what-you-think-it-is/).
I’d like to suggest to re-name GET /api/v2/documents_list to GET /api/v2/documents
You are saying when force_sync is false, it will return whatever documents list we have. It will return even an outdated list or an empty list? We need to think about it more.

POST /api/v2/fetch_documents -H “File Number: XXXXXX”

Are you suggesting that this endpoint will fetch the VBMS/VVA manifest again or it will look at our DB for the list of document_ids?
When you say, it will start asynchronous job to save a ZIP file in S3 of everything, are you implying it will download all files from s3 to the local server and zip them and then upload the zip file to s3?
I don’t like that we use the word fetch everywhere. I’d like to suggest to use simpler names. For example, POST /api/v2/documents.

GET /api/v2/fetch_documents_status/:id

Are we going to persist statuses?
Thinking about this more. In eX we need to know when a specific document failed so we can display it to the user.

GET /api/v2/documents/:id -H “File Number: XXXXXX” I agree with Mark we can omit a file number here.

GET /api/v2/documents -H “File Number: XXXXXX” It can have the same name as #1, it will just respond to a different format.

respond_to do |format|
   format.zip do
      …
   end
end

lowellrex commented 6 years ago

Kudos @amprokop, this document looks great!

One question, could we have the entire API be asynchronous (i.e. each API request returns an ID and the client uses that ID to poll some other endpoint(s) to see if the request is ready to be returned)? I don't know how well this jibes with the RESTful way of doing things, but it seems like having the entire API be asynchronous so that we can avoid exposing implementation details to the client and to more gracefully account for some API calls taking longer sometimes (like when efolder isn't able to use its cache).

anyakhvost commented 6 years ago

Please reference https://github.com/department-of-veterans-affairs/caseflow-efolder/blob/master/docs/v2/endpoints.md

department-of-veterans-affairs / caseflow-efolder