Open AlexanderKaraberov opened 6 years ago
I'm -0.5 on the proposal.
Concerns:
1) Overall throughput decrease if a database doesn't have any of these new functions, just from having to traverse your new code path on every read, 2) Lack of security and sandboxing in Erlang-based functions, which is why we ship with the Erlang query server disabled by default
I know that point 2 above is why this feature will never land at Cloudant and would likely never land at anyone running a public/SaaS CouchDB offering. A potential alternative would be #1554 's approach.
The only way to handling point 1 above is to actually have proper load testing & metrics showing the impact. This is something Cloudant devs have done with their major functionality changes in PRs over the past couple of years. Do you have any data points to share?
The alternative to this was already proposed by the committer list and is in the roadmap is the per-document permissions: the _access
proposal. See https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
This approach already has a bunch of code behind it as well, and would vastly improve on the current db-per-user access model. With _access
would we even need validate_doc_read
functions? I don't think so.
There exist approaches to filter outgoing data streams very fast without intervening CouchDB codebase – see ie https://github.com/ermouth/covercouch – however I doubt it’s useful for 2.x.
Overview
Context
This is not really a feature request because we do have this functionality implemented in our CouchDB 2.x fork. However I consider this feature to be a good fit for upstream repository and I decided to open a discussional issue first instead. Let me know if it is better to directly submit a PR with this functionality.
Background and strategic fit
CouchDB already has support for validate_doc_update functions which if defined in the ddoc can be used to prevent invalid or unauthorized document update requests from being stored. In other words document-level write security. I'm proposing a patch which adds the same type of security but for all document reads. This functionality have been used in our production cluster and proved to be useful. There are a lot of valid use case scenarios akin to "Some particular user may only read documents he created" and so on. This also may alleviate overhead caused by alternative solutions such as per-user DBs.
Implementation details
First things first. This feature can lay waste to read performance if
validate_doc_read
functions will be implemented in JS due to notorious external query server and serialisation/deserialisation and standard I/O overhead. Thereby I propose to impose a limitation on these functions namely they have to be implemented only in Erlang and marked as advanced functionality. Perhaps they have to be available solely by means of some additional config such asvalidate_doc_read = true
. Therefore this will not in any case affect default CouchDB setup but might be useful for advanced users. When implemented as Erlang ones potential overhead of these functions is negligible but benefits are perceptible. In a nutshell implementation extendsload_validation_funs
to also find, load and parsevalidate_doc_read
functions from design documents and then make appropriate calls in the end ofmake_doc()
. Unauthorised exception will be thrown and status code 403 returned from thechttpd
handler when some invariants invalidate_doc_read
logic are violated.Conclusions
I've decided to create this issue instead of directly sending a PR because at first I would like to gather feedback from core Apache and Cloudant maintainers regarding the usefulness and expediency of this feature. In case of a more or less positive one I would be glad to submit a PR which will shed more light on the technical part (which is pretty straightforward) as well as open a discussion about potential improvements.