apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.17k stars 1.03k forks source link

proposed 1.x deprecations #1534

Closed wohali closed 4 years ago

wohali commented 6 years ago

The following have been proposed for deprecation in a future release of CouchDB:

The joke about removing attachments has been removed...that's not happening :)

johs commented 6 years ago

Items on this wish list adds up to removing the ability to use CouchDB for "vanilla Couch" as @nolanlawson called it in this blog post from 2013 and discussion from 2014: "to use I’m even more pro-Couch than when I wrote this article. It may be true that ultimately your webapp will grow in complexity to the point where you’re forced to put a proxy server in front of CouchDB to smooth out its rougher edges, but I think you’ll still be amazed how far you can get with vanilla Couch."

Design documents and couch_httpd_proxy used to the onboarding feature, the reason why is very well described in Nolan's blog post.

AlexanderKaraberov commented 6 years ago

Hi @wohali I wonder what are the proposed alternatives to:

VDUs? Update functions?

Embedded directly into the couch core Erlang functions? I'm asking because in our production environment we are using a heavily customised to our use case Erlang validate_doc_update functions and even implemented a custom logic in our fork of CouchDB to support validate_doc_read functions. So I'm interested where is VDU stuff heading to?

Also,

View changes (never worked right)

I saw in the related dev-mailing list discussion that the plan was to remove them in order to implement mrview's seq and keyseq indexes in a proper way, and include this functionality back into the project afterwards. Is this true?

wohali commented 6 years ago

Don't ask me, I'm just the transcriber of the content. The proposals came from @chewbranca and @davisp. But yes, @AlexanderKaraberov I believe that's the plan for view changes, IF (and that's a big IF) it can be made to work correctly.

I can't speak for anyone other than myself, but I'd be surprised if VDUs and update functions would be removed unless something useful can replace them, like declarative Mango-style replacements. Further see #1532, #1533 which could serve as the basis for a replacement as well.

janl commented 6 years ago

Design documents and couch_httpd_proxy used to the onboarding feature, the reason why is very well described in Nolan's blog post.

we’ve been over this a couple of times now. This is all on the way out unless someone comes and takes responsibility for it initially and going forward. That means writing and actively maintaining Erlang code. That nobody since 2012 or so stepped up to do this is enough of an indicator that these features can go, especially with some of the partially broken already in 2.x.

ermouth commented 6 years ago

That nobody since 2012 or so stepped up to do this

Oh, please. I personally tried to, and PMC reception was subzero cold. No one wants 15 PMCs directly hostile to couchapps telling him what to do, or telling ‘oh, we r sorry, we have only 8 month to new release so features list is frozen’. Taking in account ~half of PMC now consists of people only known for adding +1 under regular initiatives to amputate/deprecate/slowdown this or that, and for having no real projects with Couch, I have no doubt no one will volunteer.

Also taking in account I personally know 6 companies having custom builds of CouchDB with QS-related features and fixes, and all say they never contribute fixes/features to this repo because of PMC toxic attitude to couchappers (let me skip ML quotations).

So may be the case is not about having no QS maintainers? May be the case is PMC carefully crafting Stockholm syndrome in those who could (or even tried to) contribute?

nolanlawson commented 6 years ago

I assume this blog post is the one being referenced? I don't really want to wade into the couchapps debate (certainly not with a blog post I wrote in 2013 😛), but FWIW many of these same features have been discussed getting deprecated in PouchDB and pouchdb-server for years. @daleharvey can probably provide a more up-to-date perspective on that.

wohali commented 6 years ago

@ermouth This is a formal code of conduct warning. You need to assume good faith in this discussion, and I don't believe you are doing so. You are just continuing to bring more anger and rancor to the community. I recognize you are angry and want to fight for something you believe in, but if you continue in this accusatory and belligerent manner, I will remove you from this discussion.

The CouchDB contributors and PMC members you mention that "just +1" have a lot of commits to the project behind the scenes - especially if you look at couchdb-fauxton (Garren, Antonio, and Alexis are very active contributors) and the Cloudant CouchDB repo (Mike Rhodes, Adam Kocoloski, Robert Newson, Ilya Khlopotov, Nick Vatamaniuc, and Paul Davis, to name just a few) from which a lot of important PRs come. It also includes non-Cloudant people like Dave Cottlehuber and Andy Wenk, whose contributions in terms of organisation, coordination and release management are very welcome. These very people - the people who are doing the work on CouchDB and contributing regularly - are the ones who get to decide what happens on the project. That's how Apache works.

We have to make a pragmatic, practical decision about how to proceed with the resources we have, the resources we attract and the direction we're looking to go in. Couchapp functionality is no longer the focus of CouchDB, for all the reasons previously stated.

You want to contribute to couchapps still? Great - either submit a PR with your proposed changes and improvements, or stop complaining. Those are the choices I are giving you. I am personally asking you - not anyone else in this thread or any other company, and as nicely as I can - to submit your PRs in good faith and I will personally look at them in good faith.

As an aside, If you know of 6 companies that refuse to contribute code here, as much effort as I could try to put in and heal wounds and mend differences, it's their choice if they contribute or not - and as our VP points out, they haven't for many years now. The AL2.0 means they are more than welcome to maintain their own fork, as long as they do not call it Couch-anything or Apache-anything.

natevw commented 6 years ago

As someone who fell in love with Couch 0.x/1.x in large part because of the CouchApp dream, I say:

Yes, please. Backfill those rabbit holes!

Removing _show/_list/_update, all the rewrites/proxy stuff, and probably even validate_doc_update (and filtered replication??) will make it clear how the present CouchDB intends to be used: below an application layer, i.e. as a database.

I missed the unspoken transition and also got hurt holding out for improvements in an area which the maintainers were no longer interested in. Officially deprecating and eventually breaking all the novel/experimental use cases — which are still wandering around "undead" in the form of old blog posts and early guides — should help focus future adoption imo.

johs commented 6 years ago

I share the frustration of @ermouth over the fact that his work with the rewrite function was not included in the 1.7. The requirement for active contribution from @janl has been very clear, and was met with efforts from @ermouth and @giowild in the past, the promise that was expected to follow the requirement is no longer credible.

The establishment of the couchapp mailing list led to a peak in interest, but with no proposals being accepted this came to a halt. The list no longer on the web site, another of those undead items. 1.7 and 2.0 could have been good for couch apps, but with one missing rewrite functions and the other missing couch_httpd_proxy, we already use a patched version of 1.6.2.

@natevw said it in a funny way "Backfill those rabbit holes!" since the quicky-breeding onboarding of CouchDB is what I always saw as the win-win. Getting hurt holding out for improvements in an areas which the maintainers are no longer interested is the problem that we as users must take responsibility for.

A fork is the only option left.

daleharvey commented 6 years ago

I dont even think a fork is needed, with PouchDB I got lucky with having the hindsight of seeing how CouchApps played out, PouchDB core got to start out as what CouchDB is moving towards, being a solid database and that makes it easier for people to write things like pouchdb-server or rxdb which provide more application level constructs

rnewson commented 6 years ago

A gentle reminder from me that the opening comment on this issue is proposals for things to remove. We should use the issue to discuss which things we will actually remove, when we'd remove them, and what, if anything, would replace them.

rnewson commented 6 years ago

For my part, here's my take on each item;

We should definitely remove;

We should probably remove;

We should not remove;

ermouth commented 6 years ago

@wohali

recognize you are angry and want to fight for something you believe in You want to contribute to couchapps still?

There is kind of misunderstanding. We use couchapps, but not in the way you think. We have Couchbox, which completely replaces couchapp-related QS fns, able to run legacy code, and is much faster and reacher with features (but much heavier). So we do not need _list, _rewrite or _update for couchapps.

But there exist other scenarios, probably not so visible, where built-in QS stuff seem to be extremely valuable.

Update fns is the only way to send whatever data to Couch without prior reading. It‘s invaluable feature for ie nets of sensors. Couch here acts as an aggregator, and ability to run Couch on PI allows to make aggregators very lean, setup them in place, and avoid app layer completely (which is obviously important for lean devices).

Lists are also valuable for scenarios of aggregating data from net of sensors. You can perform remote data lookup on whatever basis without fetching real data, and have no serious bucket stalls, like you have with views. Mango queries seem to provide alternative mechanics for this, but they are less flexible (however much more fast).

Rewrites (only as functions surely) are just great. Ability to re-wire API remotely has a lot of applications aside of couchapps. Also this approach of providing API is easily testable without deployment. Functions are easy to play with, unlike sets of nginx or haproxy rules.

All those features just work. They might be faster, and some improvements are low hanging fruits, but they already work in acceptable way. They only need very minor repair time to time.

So I can’t accept your ”submit PRs or stop complaining“. There’s not so lot of things I personally want to improve. However, I think I can help with fixing (or at least nailing down) QS-related bugs. Hope it may help to preserve those features little bit longer.

wohali commented 6 years ago

@johs The improved rewrites not being included in 1.7 was an oversight in the rush to get it out, not an intentional omission. Sorry about that :(

@ermouth Thanks for the clarifications and specific points. Your assistance on the repro case for #1544 is greatly appreciated; more of this support going forward and the occasional PR is all I'm asking for. And while the passion is helpful, the anger is not. Thanks for responding civilly.

We've already clarified that update functions won't go away without a suitable replacement. I suspect that the JS engine replacement (see below) will still support update and VDU functions, but I don't see evidence of it yet. The question is more along the lines of: can we do better in a declarative, Mango-like fashion? There are other proposals to provide similar/useful functionality (#1532, #1533, #1498) if the operator decides to run without a JS engine (#1513), plus I suspect there are thoughts about doing Mango-based update functionality to filter fields, as well as apply database schema enforcement (type and format checking) in a declarative fashion for a NoJS mode. Could your embedded RasPi update function be written declaratively? I assume you're adding info such as incoming IP address, central timestamp, etc. which I could see being done automatically quite easily. This is a use case we can accelerate without the use of JS readily; we just need to agree on an API and how to write it down in a JSON blob.

The big problem with the query server implementation is the age of JS 1.8.5, and the relative difficulty of upgrading the JS engine without touching anything else, plus addressing the long-standing issue of poor query server performance, especially at scale. @davisp is the one working on bringing the new JS engine into CouchDB via a NIF over in https://github.com/cloudant-labs/erlang-chakracore . So far, the big place he's stuck is implementing lists. From what I know of Paul's current thinking, one possibility would be a full refactor of the query server protocol/implementation and use external processes again to solve this problem, but more efficiently. This is a big task and could hold up the entire effort. Another is to just drop lists going forward, preserving the better performance provided by the NIF binding for the core JS functionality. Perhaps you can assist, if you know NIFs and C/C++ well... Alternately, what would Mango need added to it to give you sufficiently improved list functionality to fully replace JS lists?

Overall, rewrites are far less often used than either of the other two features, vhosts even less, and proxies even less than that. Cloudant doesn't support JS as rewrites, and I could see Couch going the same way for a NoJS mode at the very least - with static or templated/regex rewrites over in Mango. But I think there's some concern about having rewrites at all in what is increasingly envisioned as a pure database. I can't think of another database engine that allows you to rewrite its API.... This is where we'll have to agree to disagree on the priority of a feature, but again, I'm just a single vote.

giowild commented 6 years ago

Thanks @johs and @ermouth for mention.

There is kind of misunderstanding. We use couchapps, but not in the way you think. All those features just work. They might be faster, and some improvements are low hanging fruits, but they already work in acceptable way. They only need very minor repair time to time.

I strongly agree with ermouth words. I think there is a substantial lack of comprehension on what these features can already give out of the box, as they are now, and how they can be used to achieve what today is thought impossible to achieve with couchdb, if not using a proxy.

I already tried to explain (unfortunately in vain) in the past on the ML, how vhosts and rewrites can substantially be used as sorts of firewalls and routers to implement fine access control and routing(to other design docs rewrites, show,list,updates and so on) even before reading documents from database. This opens up endless possibilities for app developers. Just to make an example: the long-time asked "document-level acl" is already possible, withouth security flaws, simply by using correctly all rewrites, shows, list, views and updates features. This is in place, as example, to deliver smileupps website, cloud panel and billing!!! Other big plus are the possibility to perform hot-swap when upgrading a website from version 1.0 to 2.0 (by updating the first routing document in the chain), or possibility to work on a draft version of a website, different from what the user is watching (by simply using a vhost pointing to a different design doc/rewrites in the same database).

These are features app developers want and these are only few examples of what can ALREADY be achieved with existing couchdb features!

After that, only adding few minor enhancements, can give tremendous benefits to app developers improving overall ease of use, flexibility and security. We tried to discuss them in the ML in vain, then we ended up implementing them in our personal Couchdb-fork.

I'm referring to the possibility of pushing more details inside rewrites documents, such as acl details. What said above, becomes trivial in this way:

[
   {"to":"ABC","from":"XYZ","logged":"true"},
   {"to":"ABC","from":"XYZ","user":"joe"},
   {"to":"ABC","from":"XYZ","role":"author"}
]

It is sad to say that what is missing in CouchDB today, is not a specific feature, but the VISION and WILL of DIFFERENTIATING from other database products.

@joan said:

I can't think of another database engine that allows you to rewrite its API and this IS AN ADVANTAGE for CouchDB!!! :-) These are exactly the kind of features where couchdb CAN and SHOULD differentiate, otherwise it will forever remain another NoSQL database, differentiating only for the fact that its API is native and not a plugin.

I would like a database meant FOR the APP DEVELOPER and not for a db or system administrator. If no app developers will ever write apps with couchdb, no db or system administrators will be ever required to mantain its databases!

IMHO app developers tends to value the following features in this order:

  1. expressiveness, flexibility and ease of use (impacting on development time) + app-level features (authentication, authorization, website+app inside database, in-browser development, etc.)
  2. security: no information must leak from the database (rewrites are great for this)
  3. performance
  4. system-level features (clustering is here :-)

This is my thinking on the argument: first features need to be built, in a secure way, then they can and SHOULD be optimized for performance. Since performance can always be achieved by improving hardware, better performance through software optimization, will always come after points 1 and 2. Point 4 is nice-to-have but not mandatory for onboarding and will always come after points 1-3.

In the end, FWIW this is my invaluable set of features:

wohali commented 6 years ago

@giowild I understand you disagree, and I appreciate your comment here, but I'm unconvinced by your argument. I especially want to thank you for being civil in your comment.

The fundamental difference of opinion here hasn't changed. The UNIX philosophy applies to CouchDB as to any other program: do one thing, and do it well. The developers and PMC of CouchDB are increasingly making the product a better database, not a better one-stop app server + database. Our key differentiation in the marketplace isn't app serving, it's replication, with the HTTP DB API/JSON interface as a close second, and scaling horizontally as the third most important feature. Apps and their related features were an afterthought, and while people like you have made it work for them, it was an experiment that didn't pay off writ large, and has become less compelling with each passing year. While we don't have live telemetry from every install that backs these claims, we have data from Cloudant customers over 6 years and from the mailing lists that do back it up.

Even on a lowly Raspberry Pi 3, you have 4 cores now; running an app server in any other language alongside CouchDB on this humblest of platforms gives you better app server functionality and performance by leaps and bounds than we can ever hope to achieve in CouchDB's core. The same goes for a proxy server like HAproxy or Nginx when it comes to proxying, vhosts and rewrites.

As to your list of features, _show and _list are arguably useful for data export in different formats from CouchDB, and update functions/VDUs are also very important (see #1554 for a possible evolution). I can see the value in them. While I know you don't want to see them go, I'm sure you'd agree that the others are not valuable in a database-only development approach, as they can be filled by an app server, a reverse proxy, or whatever other process you wish to run alongside our database.

Thanks again for the discussion. Again, I am only one vote in this deprecation discussion.

ermouth commented 6 years ago

Could your embedded RasPi update function be written declaratively?

Very unlikely, devices do not post json. For now both _update and _rewrite as a fn are able to chop up requests having binary body, and then decide. I have no doubts any DSL making this kind of work will be either useless for other scenarios, or too complex if it wants to cover everything. DSLs are not a substitute for yet unknown tasks. Any DSL is domain-specific, or it’s a cumbersome mess.

The UNIX philosophy applies to CouchDB as to any other program

Indeed, however ‘do one thing’ is, looking at most of modern really user- and dev-friendly SW, bit unpopular from UX point of view. This rule was good at CLI-only times, and those times are gone. For now, standing for this rule mostly increases transaction costs: the approach of loosely stitching things together is good for scripts, but often induces heavy deployment and impedance alignment costs for continuous systems.

Also let me remind another Unix rule that still stands, the rule of diversity: Make programs flexible, allowing them to be used in ways other than those their developers intended.

The same goes for a proxy server like HAproxy or Nginx when it comes to proxying, vhosts and rewrites

I have nothing to say about vhosts and proxying, we never used them seriously as those settings are not replicatable, but as for rewrites...

What about deployment? If you have hundreds of nodes which are not intended to be members of a cluster, updating external proxy rules is a serious pain with immense number of uncomfortable (or just dangerous) subtleties. With delivering API wiring using regular data flow, deployment is just one click because it uses most powerful feature of Couch: replication. Utils like nginx-sync do not stand even near.

So far, the big place he's stuck is implementing lists

It would be nice to have more info, no one can help having no description of a problem.

Should removing or simplifying .info obj ease the task? It was done for rewrites as a fn, and this trim, aside of other effects, gives good perf improvement comparing to other QS stuff. Actually, CouchDB 1.6.1 patched with JS rewrites and called through rewrite JS function is still in most cases bit faster then naked single-node 2.x.

Alternately, what would Mango need added to it to give you sufficiently improved list functionality to fully replace JS lists?

Full replacement is impossible, or you will make new DSL just another programming language. However if narrow the scope to simplest cases and JSON-only, joins and reducers are obvious answer. If former is more or less clear and implementable, the latter just does not fit well into DSL concept except simplest cases.

So, summing up: I have nothing to say about non-replicatable features or features proven to be buggy and unreliable. But I think most features using benefits of syncing code with data flow better be kept.

johs commented 6 years ago

We have never been closer to the core of the "couchapp" discussion than with @wohali discussing differntiation in DB land:

I can't think of another database engine that allows you to rewrite its API....

.. and @giowild discussing onboarding

If no app developers will ever write apps with couchdb, no db or system administrators will be ever required to mantain its databases!

.. and @ermouth discussing syncing code with data

I think most features using benefits of syncing code with data flow better be kept.

.. not forgetting the prototyping feature that @nolanlawson desicribed so well back in 2014:

ultimately your webapp will grow in complexity to the point where you’re forced to put a proxy server in front of CouchDB to smooth out its rougher edges, but I think you’ll still be amazed how far you can get with vanilla Couch.

CouchDB has offered truly unique CI/CD productivity on a single platform.

If the work with the https://github.com/cloudant-labs/erlang-chakracore is successful and javascript developers can still host their server-side functions in CouchDB and clientside apps as attachements maintained in couch/pouch mini-IDEs like http://ddoc.me/ I think the most interesting question is: What is the disadvantage of preserving the experimental space of design documents?

If this opportunity space -- especially related to onboarding -- can be preserved at low cost, then it would be very unwise to remove it.

PS thanks for the gentle reminder and title edit by @rnewson

[wohali: Edited my handle, I am not @-joan, I am @wohali...]

wohali commented 6 years ago

Speaking for myself, the disadvantage is that these features have poor functionality as compared to what you can do with an app server alongside or in front of CouchDB. They mislead people into thinking CouchDB is a fully-fledged application server environment. People waste time building things in this environment, discover all of the inherent limitations, and have to rewrite everything in an standalone app server. They then complain about the lack of functionality in something that has stagnated for many years.

Bringing chakracore into CouchDB is not going to allow you to npm install modules into a ddoc so you can leverage the entire JS ecosystem inside a CouchApp. The sandbox I expect to come with the new engine will continue to enforce the same limitations that exist in the current query server implementation.

To put a finer point on the standpoint of the entire development team, the documentation's single remaining reference to CouchApps has a good summary that I wrote in coordination with them:

Note: Previously, the functionality provided by CouchDB’s design documents, in combination with document attachments, was referred to as “CouchApps.” The general principle was that entire web applications could be hosted in CouchDB, without need for an additional application server.

Use of CouchDB as a combined standalone database and application server is no longer recommended. There are significant limitations to a pure CouchDB web server application stack, including but not limited to: fully-fledged fine-grained security, robust templating and scaffolding, complete developer tooling, and most importantly, a thriving ecosystem of developers, modules and frameworks to choose from.

The developers of CouchDB believe that web developers should pick “the right tool for the right job”. Use CouchDB as your database layer, in conjunction with any number of other server-side web application frameworks, such as the entire Node.JS ecosystem, Python’s Django and Flask, PHP’s Drupal, Java’s Apache Struts, and more.

jdmintz commented 5 years ago

I shared this in #fdb channel in CouchDB slack, but posting here as well https://docs.google.com/document/d/1vvqgMR5U1X8yIJj7ctse_D31VVNXb9NnMIJKg8bDBl8/edit.

Was keeping track of some of the deprecations and limitations in CouchDB 3.0 + 4.0

wohali commented 5 years ago

Hi @jdmintz , thanks for joining our community. I assume you're with IBM/Cloudant? We've never had any interaction with you in the CouchDB community through our official communication channels (Slack is not official).

I want to note for observers of this ticket that decisions on what features are and aren't being kept from CouchDB 1.x are made by the committer community and CouchDB PMC, not by IBM/Cloudant. While we're really happy to have IBM's support in building CouchDB 4.0, the ultimate decision is reached by our community, not by any corporation.

Further, those decisions are reached through discussions on our official mailing list, dev@couchdb.apache.org. There are also formal requirements for feature deprecation set our in our project bylaws Posting plans by a single part here here is insufficient.

I would encourage you to join our developer mailing list if you are proposing deprecations over and above what's already been discussed by the community in this topic or on our list.

jdmintz commented 5 years ago

Apologize if it came across as decision making. That was not my intent.

I was just seeking to summarize things scattered across ML posts, RFC drafts, and earlier comments in this ticket.

ermouth commented 5 years ago

I want to note for observers of this ticket that decisions on what features are and aren't being kept from CouchDB 1.x are made by the committer community and CouchDB PMC, not by IBM/Cloudant.

I want to note for observers that this statement is at least imprecise. Consider counting +1 replies under deprecation [VOTE] postings at MLs. About 3/4 of votes are from IBM/Cloudant employees.

rnewson commented 5 years ago

only PMC votes are binding.

johs commented 5 years ago

Hi @wohali , A summarization of outgoing features would have been nice, especially with respect to 3.0

In his August report @janl said...

Current planning includes both a CouchDB 3.0 and a CouchDB 4.0 milestone. 3.0 will include the best version of the current, mostly Erlang-based project, with many new features contributed by various project partners (but notably IBM).

... which made me hopeful that the "couch app" features would survive into version 3.0 together with the spider monkey update, ref https://github.com/apache/couchdb/issues/1875

wohali commented 5 years ago

@ermouth First off, this is not the place for this discussion.

Secondly, the committer base and PMC is quite diverse - here is a full list.

@johs That decision has not been taken yet, and the two decisions you mention are indeed separate.

rnewson commented 5 years ago

correction, committer votes are binding for some things, PMC votes for other things.

ermouth commented 5 years ago

@wohali, it’s not a discussion, I just state that +1 votes under deprecations are made mostly by PMCs who are IBM/Cloudant employees. This fact is easily accountable, which raise obvious questions to you personally, as ASF board member responsible for ethics.

wohali commented 5 years ago

@ermouth Point taken. Yes, I am well aware of this concern, and am one of the louder voices regarding "corporate capture" of projects at the Foundation level - which I think everyone on the CouchDB project knows as well.

But this is not the place to discuss this point. If you have anything further on this topic, please take it to the mailing list, not this issue - it does not belong here.

wohali commented 4 years ago

Closing this ticket in favour of the following: