Open portante opened 1 year ago
I noodled a bit on this quite a while back, and decided that for the APIs we had, duplicating code in order to get a slightly more efficient HEAD
just wasn't justifiable.
I think that the one exception might be /datasets/inventory
where we could pull the size, type (+ cache headers, E-tag, etc) out of the cache manager and generate the headers without manifesting the response stream. (Though I'm not sure how efficiently Flask's defaults will handle that.)
I really don't want to duplicate the code to generate "just headers" unless there's a clear advantage over the default "truncated get
" mechanism; the duplication adds its own set of problems.
I decided to defer even worrying about this until we had a clearly delineated example. However the easiest way to enable it would probably be to drop the "abstract" _head
and have _dispatch
check on HEAD
. If getattr(self, "_head")
then call it, otherwise call _get
and drop the response data
more or less as Flask
does by default. That allows us to override the behavior where we really want but avoid duplicating code otherwise.
I noodled a bit on this quite a while back, and decided that for the APIs we had, duplicating code in order to get a slightly more efficient
HEAD
just wasn't justifiable.
Certainly duplicating code is a bad thing in general, so I understand the sentiment, but I think it is a bit misplaced.
This assumes that we really do want to allow a HEAD
operation for every GET
. It also means that without deliberately choosing to allow it, current or future APIs which are expensive, or increasingly become expensive, become an attack vector (deliberate or inadvertantly).
Further, looking through the existing APIs (enumerated below), I don't see enough APIs that return header values to warrant keeping this default of automatic HEAD
for every GET
.
In fact, I think we can survive just fine without a HEAD
operation, except for the inventory (I'd like to use a HEAD
for the inventory API to verify a dataset is removed on delete in the functional tests, and it would also be useful to use it for a CLI command to pull a tar ball down for detailed review up close).
Let's look at the APIs available answering to the GET
method today:
/api/v1/endpoints
HEAD
will not provide any useful behavior other than checking that
the GET
endpoint existsHEAD
could be avoided entirely, or if one wants to provide it as a liveness test, a separate HEAD
implementation could just return, without creating duplicate code/api/v1/user
HEAD
will not provide any useful behavior ...HEAD
could be avoided entirely, or .../api/v1/datasets/daterange
/api/v1/datasets/inventory/<string:dataset>
/api/v1/datasets/inventory/<string:dataset>
/api/v1/datasets/inventory/<string:dataset>/<path:target>
HEAD
and GET
API interfaces where the sendfile
call is not invoked/api/v1/datasets/list
/api/v1/datasets/metadata/<string:dataset>
/api/v1/server/audit
/api/v1/server/configuration
/api/v1/server/configuration/
/api/v1/server/configuration/<string:key>
Given this list, all of these APIs that would support HEAD
without doing anything by just dropping the response do not DO anything to make that even worth while for the server to respond to in normal operation.
There has to be a good reason to provide a behavior for methods of APIs.
Aside from the inventory APIs, if we can come up with a good reason for those to exist, great, I'd be all for it. But we need to default to not using ANY server resources unless it is truly warranted.
For reference, HEAD
support in data sets inventory, see PR #3138, https://github.com/distributed-system-analysis/pbench/pull/3138/commits/ae3c766c56c9c25a615af36887513fddad89681b.
There has to be a good reason to provide a behavior for methods of APIs.
@portante, which problem are you trying to solve?
The previous arrangement (as I understand it...) provided implicit support for HEAD
requests to any endpoint which provided a GET
request, without requiring any additional implementation. This seems like a Good Thing™ to me, especially since it didn't preclude our overriding the implicit method with an explicit one.
However, you indicated that the implicit HEAD
method could become an attack vector. Given that a request to the implicit HEAD
method would invoke the same code as one to the GET
method, I don't see how suppressing the HEAD
method prevents an attack, since the attacker could presumably just as easily exploit the cost of the GET
method (possibly to greater effect, since that method also has to return the full payload).
I understand that the implicit HEAD
methods don't provide much value in the bulk of the cases, but they also don't cost much, as far as I can see, so I'm not understanding why you want to add code to prevent their use (or am I misunderstanding things?).
PR #3122 broke support for APIs using the
HEAD
method. Below is the discussion context from that PR held post merge.Does this work? This worries me, and seems like we no longer have
HEAD
implementations for ourGET
APIs. Normally, Flask implements aHEAD
as aGET
and discards the response payload (returning only headers). I don't think it does that if thehead
method is implemented, which you're now doing anywhere, so I think aHEAD
ought to now go through to the fallback (not implemented) "abstract"_head
.While I'm not against the idea of being able to implement a specialized
_head
I don't think this way will work.Or at least, I know that a
HEAD /api/v1/datasets/list
worked before, and I'm not confident that it'll work with these changes. (In fact, I don't see how it could.)_Originally posted by @dbutenhof in https://github.com/distributed-system-analysis/pbench/pull/3122#discussion_r1054934150_