department-of-veterans-affairs / caseflow-efolder

Tool for bulk download of efolder claim files
15 stars 8 forks source link

[spec] eFolder Express v2 API #672

Closed amprokop closed 6 years ago

amprokop commented 6 years ago

Context

V1 of the eFolder API was somewhat heroically developed by @aroltsch and @mdbenjam. We got it out the door, and it works for Reader. But eFolder was originally developed from front to back as a "Download All" button for VBMS. We've since grafted a few things onto it: the ability to download separate documents instead of a ZIP file, the ability to fetch documents from VVA as well as VBMS, and an API.

There are several areas of the API we could do better with:

Overview

We have been considering a refactor of eFolder Internals, documented here and here. We should couple those goals to development of v2 of the eFolder API, so we can make breaking changes without fear and roll out incrementally behind a simple feature flag.

Goals

eFolder v2 API routes

GET /api/v2/documents_list (?force_sync=true|false) -H “File Number: XXXXXX”

Returns a JSON list of documents associated with a file number.

More details

Use cases

POST /api/v2/fetch_documents -H “File Number: XXXXXX”

Starts fetching and saving all the documents from our document sources. Returns an ID of the fetch attempt so the caller can check on its status later.

More details

Use cases

GET /api/v2/fetch_documents_status/:id

Returns the status of our attempt to fetch documents from document sources, save them to S3, and make them available via our API.

More details

Use cases

GET /api/v2/documents/:id -H “File Number: XXXXXX”

Returns a document file.

More details

PUT /api/v2/documents/:id -H “File Number: XXXXXX”

Updates some parameters of a document.

More details

TBD—whatever parameters we would like to update. Parameters we can update in VBMS can be updated in VBMS. Annotations? Comments? Tags? Document type?

Use cases

GET /api/v2/documents -H “File Number: XXXXXX”

Returns ZIP of all documents.

More details

Use cases

Associated Work

https://github.com/department-of-veterans-affairs/caseflow-efolder/issues/670

Open Questions

mdbenjam commented 6 years ago

I think there are a lot of good things with this API design. One of the central problems I dealt with when designing the first one is a tension between REST and revealing implementation details. As a consumer of the API, I don't really want to have to think about starting remote jobs, or the order in which I call the API. Ideally, I want to be able to hit one endpoint for the list of documents and another to download the documents. Of course this has lead us to do things in a non-RESTful manner by modifying the DB on GET requests. Our justification of this approach was that since it's just caching files, it's not really THAT un-RESTful, the route is still idempotent.

With this new approach, which I think has a lot of merits, we put more burden on consumers of the API to understand how to call it. And calling the API becomes more verbose. Now that might be reasonable tradeoff, but I think we should evaluate it as such.

Some specific thoughts on the routes.

For GET /api/v2/documents_list (?force_sync=true|false) -H “File Number: XXXXXX” do we really need the force_sync parameter? I thought you wanted to get away from doing things synchronously. Also since this is a GET that launches jobs, it's not RESTful

For POST /api/v2/fetch_documents -H “File Number: XXXXXX” I think there should be a flag for whether to start the zip file job. This is an expensive thing to run, and something we don't always need.

For GET /api/v2/fetch_documents_status/:id could this be named GET /api/v2/fetch_documents/:id? Do we need another path specifically for status? Not sure about the standards here on polling endpoints.

For PUT /api/v2/documents/:id -H “File Number: XXXXXX” We don't need to pass the file number. The doc id is sufficient. We should also get the doc from VBMS even if it's not already cached in S3.

PUT /api/v2/documents/:id -H “File Number: XXXXXX” is more of a long term goal. We'll probably need to migrate to v2 of the VBMS api before we can modify things in VBMS. But still will be useful then.

sunil-sadasivan commented 6 years ago

Thanks for spec-ing this out @amprokop! This feels like a solid refactor and proposal.

I'd be curious to learn more about the consumers of this API and their requirements.

So far we have today:

I'd recommend consulting with @Chingujo on future plans / iteration of the UI to understand how we see eFolder evolving and if that brings considerations to shape the direction of the v2 API.

Some specific thoughts more on the plan:

Chingujo commented 6 years ago

Thanks @sunil-sadasivan! and the API looks exciting in making eFolder express flexible and more integratabtle across Caseflow. Currently specing out a long term vision for eFolder express. In terms of shorter term UI changes, I don't see drastic changes unless in response to production bugs, simple UX improvements, and content changes (amid some open questions still left to be answered).

Planning on a sync with @mdbenjam on Monday to see how all this fits in to ex's future =D

anyakhvost commented 6 years ago

@amprokop Thank you for writing this up. Some thoughts and questions on the endpoints:

GET /api/v2/documents_list (?force_sync=true|false) -H “File Number: XXXXXX”

  1. I do see the point of having force_sync flag, however, we will be exposing implementation details to the api consumer. What can we do to avoid the flag? Maybe we can have two different endpoints?
  2. I know we are trying to be truly RESTful but I don’t think it is going to be possible and that’s okay (https://www.mobomo.com/2010/04/rest-isnt-what-you-think-it-is/).
  3. I’d like to suggest to re-name GET /api/v2/documents_list to GET /api/v2/documents
  4. You are saying when force_sync is false, it will return whatever documents list we have. It will return even an outdated list or an empty list? We need to think about it more.

POST /api/v2/fetch_documents -H “File Number: XXXXXX”

  1. Are you suggesting that this endpoint will fetch the VBMS/VVA manifest again or it will look at our DB for the list of document_ids?
  2. When you say, it will start asynchronous job to save a ZIP file in S3 of everything, are you implying it will download all files from s3 to the local server and zip them and then upload the zip file to s3?
  3. I don’t like that we use the word fetch everywhere. I’d like to suggest to use simpler names. For example, POST /api/v2/documents.

GET /api/v2/fetch_documents_status/:id

  1. Are we going to persist statuses?
  2. Thinking about this more. In eX we need to know when a specific document failed so we can display it to the user.

GET /api/v2/documents/:id -H “File Number: XXXXXX” I agree with Mark we can omit a file number here.

GET /api/v2/documents -H “File Number: XXXXXX” It can have the same name as #1, it will just respond to a different format.

respond_to do |format|
   format.zip do
      …
   end
end
lowellrex commented 6 years ago

Kudos @amprokop, this document looks great!

One question, could we have the entire API be asynchronous (i.e. each API request returns an ID and the client uses that ID to poll some other endpoint(s) to see if the request is ready to be returned)? I don't know how well this jibes with the RESTful way of doing things, but it seems like having the entire API be asynchronous so that we can avoid exposing implementation details to the client and to more gracefully account for some API calls taking longer sometimes (like when efolder isn't able to use its cache).

anyakhvost commented 6 years ago

Please reference https://github.com/department-of-veterans-affairs/caseflow-efolder/blob/master/docs/v2/endpoints.md