apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.26k stars 1.03k forks source link

[Feature Proposal] Pull Request to add support for Erlang-only validate_doc_read functions #1724

Open AlexanderKaraberov opened 6 years ago

AlexanderKaraberov commented 6 years ago

Overview

Context

This is not really a feature request because we do have this functionality implemented in our CouchDB 2.x fork. However I consider this feature to be a good fit for upstream repository and I decided to open a discussional issue first instead. Let me know if it is better to directly submit a PR with this functionality.

Background and strategic fit

CouchDB already has support for validate_doc_update functions which if defined in the ddoc can be used to prevent invalid or unauthorized document update requests from being stored. In other words document-level write security. I'm proposing a patch which adds the same type of security but for all document reads. This functionality have been used in our production cluster and proved to be useful. There are a lot of valid use case scenarios akin to "Some particular user may only read documents he created" and so on. This also may alleviate overhead caused by alternative solutions such as per-user DBs.

Implementation details

First things first. This feature can lay waste to read performance if validate_doc_read functions will be implemented in JS due to notorious external query server and serialisation/deserialisation and standard I/O overhead. Thereby I propose to impose a limitation on these functions namely they have to be implemented only in Erlang and marked as advanced functionality. Perhaps they have to be available solely by means of some additional config such as validate_doc_read = true. Therefore this will not in any case affect default CouchDB setup but might be useful for advanced users. When implemented as Erlang ones potential overhead of these functions is negligible but benefits are perceptible. In a nutshell implementation extends load_validation_funs to also find, load and parse validate_doc_read functions from design documents and then make appropriate calls in the end of make_doc(). Unauthorised exception will be thrown and status code 403 returned from the chttpd handler when some invariants in validate_doc_read logic are violated.

Conclusions

I've decided to create this issue instead of directly sending a PR because at first I would like to gather feedback from core Apache and Cloudant maintainers regarding the usefulness and expediency of this feature. In case of a more or less positive one I would be glad to submit a PR which will shed more light on the technical part (which is pretty straightforward) as well as open a discussion about potential improvements.

wohali commented 6 years ago

I'm -0.5 on the proposal.

Concerns:

1) Overall throughput decrease if a database doesn't have any of these new functions, just from having to traverse your new code path on every read, 2) Lack of security and sandboxing in Erlang-based functions, which is why we ship with the Erlang query server disabled by default

I know that point 2 above is why this feature will never land at Cloudant and would likely never land at anyone running a public/SaaS CouchDB offering. A potential alternative would be #1554 's approach.

The only way to handling point 1 above is to actually have proper load testing & metrics showing the impact. This is something Cloudant devs have done with their major functionality changes in PRs over the past couple of years. Do you have any data points to share?

The alternative to this was already proposed by the committer list and is in the roadmap is the per-document permissions: the _access proposal. See https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E This approach already has a bunch of code behind it as well, and would vastly improve on the current db-per-user access model. With _access would we even need validate_doc_read functions? I don't think so.

ermouth commented 6 years ago

There exist approaches to filter outgoing data streams very fast without intervening CouchDB codebase – see ie https://github.com/ermouth/covercouch – however I doubt it’s useful for 2.x.