Closed jmarin closed 8 years ago
Below are some thoughts on the structure of different endpoints, I encourage comments at any level of detail. I was parsimonious in the proposed responses, but in general attempted to allow responses to be flexible/accommodate change.
For example, the /edits/rows
endpoint returns an object with a single rows
key rather than just the rows array because we may want to return metadata on every request (timestamp, etc)).
We may want to create separate issues for each endpoint for discussion, but for now, enjoy this lengthy comment.
/institutions/<id>
{
id: <id>,
name: <instituion name>,
submissions: [
{
id: <incremental submission id>,
status: <current status>,
timestamp: <timestamp>
}
]
}
<id>
A unique id generated for the institution in question (when/where this is generated tbd)
<incremental submission id>
unique id per institution, incrementing with resubmissions
Submissions as an array: there are several options here:
currentSubmission
Do we/should we provide a currentSubmission key so consumers can more easily get the latest submission (which is usually what people will be interacting with). Also, guaranteeing the order of submissions (eg sorting either descending or ascending on id) would provide just as much ease of use
Guarantees on incremental submission id: should we use a counting service/CRDT? In practice, there shouldn't really be an issue, as new submissions will mostly be infrequent and spaced apart
Lack of year info: I think we should have different endpoints for each year, if not every year, cf.gov/hmda/2018/...
, so including it in the response would be redunant (though I have the id in the response which is also redundant by this logic)
Edit Counts? The UI has been mocking up some components that assume the the submission object will have edit counts. This isn't really necessary imo, as the status will give all the actionable info (resubmit/verify/sign, etc) and a click through to the page where one can view edits will display edit counts
Should /submissions/<id>
be its own endpoint? I don't think so, but I could see an argument for it
/institutions
{
institutions: [
{institution object 1},
...,
{institution object n}
]
}
Returns all the institutions a user is authorized to file for (usually 1, but more for vendors and in some cases holding/parent companies) Could alternatively return an object keyed by the institution objects' ids, which would be a similiar pattern to option 2 above for how the submissions object could be structured
/edits/rows?institution=<institution id>&submission=<submission id>&page=<page>
{
rows: [
{
row: <row>,
edits: [
{
id: <edit id>,
type: <edit type>,
message: <edit failure message>,
verification: <verification if quality>
},
...
]
},
...
]
}
<row>
is the LAR in question
<edit id>
is the weird edit id, eg Q044
This can be used to either link to the edit text or embed the edit text in the row without having to send it over the wire for each row. This also applies to classification of edits (eg HOEPA edits, loan amount, etc). A map of edit categories can be loaded separately if needed
<edit type>
is syntactical/validity/quality (macro cannot be displayed by row)
<verification
will only be present for quality edits, the key will not be part of the syntactical and validity responses. These probably shouldn't be separated in different endpoints because the purpose of the /rows/ endpoint is to collect the different edit type of endpoint together and not to require data reorganization for api consumers
Possible valueSubmitted
field. Left out because may not be needed if the <edit failure message>
is explicit enough. I'd prefer to rely on the message, since the edit check code has more knowledge about the context of the submittedValue and can couch it properly. Also, I imagine valueSubmitted
and the message
would be very similar most of the time, eg. "Submitted 2 in the foo field when it must be bar or baz."
/edits/ids?type=<syntactical|validity|quality|macro>&institution=<institution id>&submission=<submission id>&page=<page>
For syntactical/validity
{
type: <edit type>,
ids: [
{
id: <edit id>,
rows: [
{
row: <row>,
message: <edit failure message>
},
...
]
},
...
]
}
For quality:
{
type: <edit type>,
ids: [
{
id: <edit id>,
rows: [
{
row: <row>,
message: <edit failure message>,
verification: <verification>
},
...
]
},
...
]
}
For macro:
{
type: <edit type>,
ids: [
{
id: <edit id>,
message: <edit failure message>,
verification: <verification>
},
...
]
}
The different edit types not returned in a single call because it often makes sense to query them separately (as they have different signatures)
/progress?institution=<institution id>&submission=<submission id>
{
status: <current status>,
editCounts: {
syntactical: <count>,
validity: <count>,
quality: <count>,
macro: <count>,
}
}
Returns the status and editCounts for a submission of an institution. This endpoint should support long-polling
/upload
The LAR file, encoded via multipart/form-data
HTTP 201
Upload progress can be handled client-side, so this post can be fairly simple
/verify?institution=<institution id>
{
id: <edit id>,
rows: <an array of 1-n rows to verify with the verification text>,
verification: <verification text>
}
HTTP 201
Always/only works on the latest submission The institution COULD be part of the post data, but as listed it is more in line with the function of the GET endpoints
/sign?institution=<institution id>
none/TBD auth stuff
HTTP 201
Actual form of the signature/final submission is TDB, but whatever it may be it will need an endpoint similar to the above
@wpears, thanks for putting this together. Lots of good stuff here. It's given me lots of ideas. For now, I'll start with URL patterns.
I propose (almost) all resources are based on which institution (/institutions
) it falls under. Users will be associated to institutions, and it will make authorization much simpler if we can simply check all requests against that root of the URL.
I also feel like the resources in general could be more hierarchical. For similar reasons, it'd be much easier to define rules at each hierarchical level rather than on a per-endpoint basis as we'd have to do with a flattened model. What are you thoughts on something like:
/institutions
GET
- List of all institutions available to a given account.POST
- We could use this resource for adding new FIs to the system. This would not be exposed for public use./institutions/12345678
GET
- Detailed attributes for a given FI.PUT
/PATCH
- Again, we could update individual FIs here, but would not be available to the public....and if we want to make it even friendlier, we could incorporate a sluggified version of the FIs name with the id.
/institutions/12345678-bank-of-bankerton
/institutions/12345678/users
GET
- List of all users (and perhaps their status) for a given FI.POST
- Associate a given user with an FI.Note: There is likely a separate URL structure related to users rooted at /users
. More thoughts on this coming soon...
/institutions/12345678/filings
GET
- List of each filing periods with the status of each.Note: "Filing" may not be the right term. Open to something better.
/institutions/12345678/filings/2018/q1/
GET
- List all submissions for a given filing period. I threw in the q1
since we'll need to support quarterly filings for large FIs sometime soon. If this feels too granular, we could have a .../2018-q1/
, or something similar.POST
- This seems like the logical place for file upload to go./institutions/12345678/filings/2018/q1/1
GET
- Details for an individual submission./institutions/12345678/filings/2018/q1/1/progress
GET
- Long-poll-y type endpoint that returns the progress of each edit type for a given submission./institutions/12345678/filings/2018/q1/1/edits
GET
- List of all edits for a given submission. POST
(PUT
?) - If we want to support bulk update of multiple edits at once, this seems like a logical place./institutions/12345678/filings/2018/q1/1/edits/1234
GET
- Details for a given edit.PUT
/PATCH
- Update a given edit, most likely providing justification on quality and macro edits.POST
to /institutions/12345678/filings/2018/q1/1/edits/1234/verify
/institutions/12345678/filings/2018/q1/1/sign
POST
- Once all edits are complete, "sign" a given submission.PUT
/DELETE
- We may need a way to "unsign" a given submission.Good stuff Hans.
Originally I had the structure a bit more hierarchical but changed to more stuff in the querystring because I thought it make the endpoints clearer at a glance (/edits/row?...
is clearly most concerned with row-aggregated edits). That said, I am completely okay with the hierarchical approach, especially if it makes defining certain rules easier.
Some other thoughts:
filings
could just be year
if the quarter is in its own path separator (ie year/2018/q1/...
). Makes less sense if the quarter is incorporated in the name (2018-q1
). Also, it almost seems quarter should have it's own collection descriptor in the URI (/year/2018/quarter/q1
). This seems a shame because the quarters will always be the same and there are only four, but if the quarter is going in the path of the URI, I think it may be necessary. Also, we'll have to make sure that the quarter part of the URI can be ignored by FIs that don't file quarterly, eg
/institutions/12345678/filings/2018/quarter/q1/1/sign
and /institutions/12345678/filings/2018/1/sign
are both valid. The quarter could also go in the querystring ^^;/institutions/12345678/filings/2018/rows
and /institutions/12345678/filings/2018/rows/1
. Is it clear that this URI points to edits (of all types) at a given row/LAR? Should we prefer /institutions/12345678/filings/2018/lars
?/edits/
/rows
paradigm as they seem like separate structures. If we don't need this, though, then no worries.@wpears @hkeeler I'm not a big fan of the quarter in the URL, given that it only covers some institutions, and it's far into the future (i.e. not what we are building right now, and not for 2018 either). The quarter can be derived from the response if it includes a timestamp too, or as a query string as @wpears is suggesting.
I have a question about the URL for row-based edits query ,/institutions/12345678/filings/2018/rows/1
. What is the identifier for the row (last digit)? The loan id? What is the identifier after "/institutions"?. Are we certain this FI identifier is unique?
The row identifier is line number for the given submission, which I think would be fairly useful for FIs (oh line 34 has a syntax edit.. I'll look at line 34).
I think the institution identifier is something WE generate so we can guarantee it is unique. When I was talking about "creating" institutions in our system, this is what I meant. When somewhen registers to file on behalf of an FI, we create a unique ID to represent that institution and then associate that unique, internal identifier to whatever the institution is they claim to be filing for. Outside of guaranteeing uniqueness, this also allows us a way to differentiate people if, eg, two employees both start filing for the same institution (which wouldn't be possible if we just associate a user to some concatenation of institution/panel data).
@wpears I don't think line number is a good identifier. How can we guarantee that every row is kept in the same place on every file submission? Makes it hard to compare possible errors for the same loan across different submissions. Also, the backend is asynchronous and cannot guarantee ordering of processing. There are ways to keep line numbers around but it is cumbersome as things may not be processed serially as they appear on the file.
The second question gets into panel generation and identity management, we can of course generate our own unique id, but we will have to maintain it and it will be different than anything else that everyone else uses. So I'm curious to see if we can define how institutions are identified today, and use that. Keeping in mind that all of this changes when we get LEI
On the first point, there is an edit for the loan id to be unique within a submission (S040), so that can be used as an identifier instead of the row number
Yeah, I'd thought of the cross-submission problem for rows, but didn't know if that type of comparison would be run (but it makes sense). Using the loan id field sounds better though, so let's move forward with that.
I think this means the rows
endpoint I listed in my response to Hans, if we're using loan id, makes more sense as lars
... so /institutions/12345678/year/2018/lars/1234
).
Also for the /edits/
endpoint the structure will be something like (note loan id):
{
type: <edit type>,
ids: [
{
id: <edit id>,
lars: [
{
id: <loan id>,
message: <edit failure message>
},
...
]
},
...
]
}
re: institution identifier, I think making our own identifier makes the most sense particularly if we are allowing self-service. AFAIK, there isn't a universal, non-public identifier we can use to prove a bank is who they say they are if we've never encountered them before (filing off panel, etc). Of course, we could use public bank info if we also require some key/sessionID that gets sent to banks where we have a known contact, but that essentially rules out self-service. By creating our own identifier that is associated with public, unique bank info, we can allow self-service and also have an easy way to manage conflicts (these two users are filing for the same FI, we need to call them).
Just as a recap to help me put this all together, and to add some of my thoughts, suggestions, and questions.
This comment ignores responses for now, I just want to make sure we all agree on the API endpoints. I think, to avoid clutter in this issue, as we agree on an endpoint we should create a new issue and have that issue focus on the response.
GET /institutions
GET /institutions/{id}
POST /institutions/{id}
GET /institutions/{id}/users
GET /institutions/{id}/users/{id}
GET /institutions/{id}/progress
(long polling)
GET /institutions/{id}/filings/latest
(see below)GET /institutions/{id}/filings
?year={year}
would list the history for a given year
?year=all
would provide the full history?quarter={quarter}
would list the history for a given quarter (?year={year}
would be required)POST /institutions/{id}/filings
?year={year}
would allow re-submission of data?quarter={quarter}
would allow re-submission of data for a given quarter (?year={year}
would be required)GET /institutions/{id}/filings/{id}
/latest
would always give us the current/most recent filing detailsGET /institutions/{id}/filings/{id}/edits
n
edits from each type (think paging)?type={type}
would list only the edits of a specific type?page={page}
would allow for paging based on the n
default we choose?per_page={number}
would allow for changing the n
default we choose
?per_page={number}
GET /institutions/{id}/filings/{id}/edits/{id}
, the above call should provide the details?POST /institutions/{id}/filings/{id}/edits/{id}
GET
?
GET /institutions/{id}/filings/{id}/edits
, would be enough?GET /institutions/{id}/filings/{id}/irs
POST /institutions/{id}/filings/{id}/sign
During discussion yesterday I think having the ability to have a /year
breakdown of these endpoints is also good.
But I do think the most logical breakdown starts with our institutions. While the date/time part of all this is also important, starting with institutions seems to allow our app, and our API endpoints, the ability to focus on the current year (and later the quarter) filings. Current filings seem to be the most important part of this and will be the bulk of the work, I think, for every year. Using query string parameters for the year and quarter still gives us the ability to allow for re-submissions and the ability to pull past data.
This pattern also seems to prevent our endpoint paths from becoming too unwieldy.
Thoughts? Did I miss some?
@awolfe76 Thanks for the summary, I think this helps. I do have some questions about specific endpoints, but they are more questions due to unclear (unknown) data models at this point (i.e. we are centering everything on institutions, but most of the backend work to this point has been on edits in isolation). I think what makes sense is to divide these into individual tickets (or maybe grouped as appropriate) and start implementing them, or at least drive the discussion deeper where it is needed. Overall though, I think this is a good point of departure.
So this differs from how the API is implemented at https://github.com/cfpb/hmda-platform-ui/pull/99 and I think needs to take in to account some of the year vs. institution stuff we spoke about in standup (which I'll have coming in another PR on the platform-ui).
So before we split these things out, I'd say we should probably wait for the next two PRs in the platform-ui repo which should help prove out some of the ideas here.
See discussion and possible endpoints here