gcarranza / couchdb-python

Automatically exported from code.google.com/p/couchdb-python
Other
0 stars 0 forks source link

Up-to-dated view server #146

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This is port of javascript view server for couchdb server 0.10+
Support of shows, lists, filters and validate_doc_updates for pythonic views 
included.

Due to currently there is no couchdb-python API to manipulate this functions n, 
I didn't wrote any tests, but it passed all tutorial examples.

Original issue reported on code.google.com by kxepal on 16 Aug 2010 at 2:19

GoogleCodeExporter commented 9 years ago
Did you write this yourself? Is it a port of the JavaScript view server, or did 
you find some documentation on the view protocol?

I definitely want an updated view server, but I prefer that we only add 
something for which we have good tests, so that we can prevent it from going 
stale again. Would you be in a position to write tests for this? Some design 
notes on the protocol would likely also be helpful.

If we decide to take this, I also have some code/style nits.

Original comment by djc.ochtman on 16 Aug 2010 at 2:37

GoogleCodeExporter commented 9 years ago
It's javascript view server ported by me. I've going by only one thing - to 
make view server maximum compatible with js one, because it's always 
up-to-dated and have released first. So as close pythonic view server will be 
to it so easy support will be.

Shame on me, I've thought for some unknown reason that view server must be 
tested via couchdb-python API - just have look at tests/view.py. I'll make them 
for tomorrow. 

Original comment by kxepal on 16 Aug 2010 at 2:59

GoogleCodeExporter commented 9 years ago
I've found a little problems due passing official view server tests, but now 
all fine.

I'll make better exceptions handling + crush tests later - I really dont like 
this forest of try..except, but couldnt invent something better now - need some 
time to think about it. So this is just check point(:

In additional to javascript view-server features I've implemented two behaviors:
- sealed document: changing document in map function makes no sense for other 
map functions within single view.
- any pythonic exception will crush view server while javascript view server 
allow crushing only on fatal errors - I dont know which ones python have. 
Everything else seems same.

Original comment by kxepal on 19 Aug 2010 at 1:38

GoogleCodeExporter commented 9 years ago
oops, forgot to remove debug handler from view server.

Original comment by kxepal on 19 Aug 2010 at 1:42

GoogleCodeExporter commented 9 years ago
Ok, here is updated view server. Major changes:
- support any couchdb server since 0.9 version. By default view server works in 
mode of compatibility with latest couchdb server version, just run it with 
--couchdb-version key, e.g. view.py --couchdb-version=0.10, to make it work 
well for 0.10 couchdb server. 
- Tests are included for each supported version - all of them had been ported 
from javascript view server.
- Assertion error within validate_doc_update function doesn't count as Fatal 
like any other pythonic exception and will be wrapped as Forbidden error
- "Reduce output must shirnk more rapidly" error now may be occured
- More verbose debug logging

Also I have to add function versionizing decorator to split their behavior for 
each couchdb version. I thinks it will be useful in future to keep legacy 
support without serious code rewriting.

Original comment by kxepal on 3 Sep 2010 at 11:31

GoogleCodeExporter commented 9 years ago
Changes:
- fixed compatibility for python-2.4+
- added more verbose debug output

I've removed old attachments, because they are not actual for now.

Original comment by kxepal on 13 Sep 2010 at 12:09

Attachments:

GoogleCodeExporter commented 9 years ago
I'm going to need to find a solid chunk of time to review this, which might 
take a while, but I'd really like to take this for the next release...

Original comment by djc.ochtman on 19 Sep 2010 at 1:16

GoogleCodeExporter commented 9 years ago
Issue 140 has been merged into this issue.

Original comment by djc.ochtman on 22 Dec 2010 at 9:13

GoogleCodeExporter commented 9 years ago
I've fixed issue #163 in this view server 
http://code.google.com/r/kxepal-couchdb-python-featured/source/detail?r=e1448890
d2223e0321f96459a53c1757dc5b9662 (just have not seen any reasons to attach once 
again all files for several small changes)

In over way it's ready for 1.0.2 since main change in js view server was in 
sealing documents for map func, but this feature is done already.

I will port sofa and tapirwiki for this view server within next two weeks, so 
this made a great challenge for him and may be more fixed will come if I've 
missed something.

Original comment by kxepal on 12 Feb 2011 at 7:08

GoogleCodeExporter commented 9 years ago
I've found one thing missed - require function to have some modular application 
within document design. But there are some questions about it:
- should it work as in javascript view-server: wrap some abstract code and 
return export values? Would be better to implement python-like import?
- should it support eggs? I think it should, but I have no idea how to import 
eggs inline without saving them on disk. This could be a problem for 
application hosters.
any other behavior suggested?

Original comment by kxepal on 22 Mar 2011 at 11:06

GoogleCodeExporter commented 9 years ago
What I'm doing to handle some efficiency issues with import:

def fun():
  import datetime
  ... other imports ...
  def fun(doc):
    ...
    delta = (datetime.strptime(doc[...]) - datetime.strptime(doc[...])).days
    ...
  return fun
fun = fun()

What really ought to happen is that the view server should go through each 
variable in the exec's locals and check the __module__. If __module__ exists, 
but is None, and the variable points to a callable, then use that as the 
map/reduce func, and error out if more than one is found. This would be 
backwards compatible with existing view functions, but would make it so 
closures are not necessary.

I think the eggs deployment issue has been tackled many times before. Importing 
eggs inline would obliterate responsiveness. Does couch time you out if you 
take to long? I'd be concerned that it would/should. If an application hoster 
supports python, then sooner or later they'd need to come up with a solution to 
handle 3rd party software since, frankly, the power of using python as a view 
server doesn't just lie in "it looks nice" and "it has yield".

To meet couch's same-code-same-result requirements (no side effects), we would 
maybe have to mark imports with python and module version strings, and push 
that back into couch. For example, 'import mythirdpartyegg' would then append 
'#mythirdpartegg py cpython-2.6.5 mod 3.11r7112' to the end of the eval string, 
one per unique detected module. Any time you do any module upgrades, you can 
just delete the version-marking comments out of the view func manually, and 
couch will regenerate it.

However, most modules *tend* to be stable enough API-wise that this isn't a 
problem. If there were any behavior altering bugs/changes to the code, an 
administrator could achieve this manually.

Original comment by extempor...@gmail.com on 22 Mar 2011 at 3:30

GoogleCodeExporter commented 9 years ago
Several questions, could I?
1) what the reason of such wrapper against:
def fun(doc):
  ...
  delta = (datetime.strptime(doc[...]) - datetime.strptime(doc[...])).days
  ...
import datetime

this is not very pythonic to place imports below, but datetime will be tried to 
import only once. However, this style could produce another problem: views must 
have not any state and any dependence from source which could be changed later.

2) Hoster could provide 3rd party modules, but it couldn't provide all versions 
of each template engine, for example. May be you needs trunk jinja2 with you 
own patches, who knows? So idea to create fully portable pythonic couchapp will 
be failed.

3) Is preprocessors statements really good idea? I saw them in couchapp, but 
they have been used only for declaration, not within document design nor view 
server.

As intermediate result, this is implementation of require function as is it 
works for javascript view server:
http://code.google.com/r/kxepal-couchdb-python-featured/source/detail?r=0b6625db
473e74a83df7e9a339899a3c318f7b80

I still need to finish some details, so attachments with new version of view 
server will be later. Sorry, Dirkjan, it seems you to have revise it once 
again, but I'll include documentation for each vital function and more tests to 
make process more easy just(:

Original comment by kxepal on 23 Mar 2011 at 8:45

GoogleCodeExporter commented 9 years ago
You *could* put imports after the inner func that needs them, and python's 
scoping rules would resolve variable lookups, but I agree, it's not Pythonic. 
Closures themselves are not very Pythonic, either. Anyways, why not put imports 
first, as per my suggestion?

The whole reason for using a closure is to avoid the performance penalty of 
repeating the imports for each document. If a function needed k imports and 
there are n documents in a couch database, then, compared to the closure 
technique shown above, there'd be n*k redundant imports taking place, which is 
very slow (python doesn't re-import the module, but there is overhead involved, 
which can be significant).

See: 
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhea
d

I disagree with you on the severity of Couch's "no side-effects" requirement 
for view/show/list functions. It's a matter of practicality, not a matter of 
theory. Yes, Couch says that the same document passed to the same code must 
produce the same output, no matter how many times it's executed. If the 
document doesn't change, then the result doesn't change.

However, this mandate is only for data correctness. If the module you're 
importing in a view function gets upgraded, and its behavior changes, but your 
view function stays the same (so couch doesn't regenerate the view), then all 
that happens is that your view will be incorrect. Couch won't break (couch 
won't know, mind, or even care).

Also, using module imports don't count as "side effects" at all. Aside from the 
random module, almost all modules (including 3rd party ones) are stateless in 
their behavior. For example, it'd be fine to describe a shape as a list of 
points in a couch document, and then use PIL to draw that to a png image in a 
show function, or to use couch to store server access logs, and then use 
pychart from a list function to generate an svg rendered line graph of server 
traffic.

Furthermore, the above "same code, same doc, same result" requirement does not 
apply across all time. For example, I could define a view function that 
imported some module, and then change the behavior of the module. All I'd have 
to do fix the consistency of the view would be to add a single space to the end 
of the view function, save the design document to couch, and then remove that 
space, and save the design document again. The code is *exactly* the same as 
before, yet we side-stepped the upgrade-changes-behavior issue, and caused 
couch to regenerate its views to reflect the new behavior.  And this process 
can be automated (alternately, you could delete the views from the design doc, 
do a view cleanup, and then reupload the original design document, which would 
require only one view regeneration instead of two). As long as you regenerate 
your views when needed, the "could be changed later" issue isn't a problem at 
all.

In general, hosting vendors are not ever going to support all versions of all 
potential packages. They're either going to support only a handful of popular 
modules (probably Django templating) and force long release cycles, or they'll 
provide you with a few megabytes of space to upload your own modules in your 
own private module path, or they'll not allow the use of any kind of 3rd party 
modules (in which case you might as well use Javascript).

Original comment by extempor...@gmail.com on 24 Mar 2011 at 6:27

GoogleCodeExporter commented 9 years ago
At first till I don't forget, thanks you for detail reply(:

So about imports: 
Allow to have them on top of design function as PEP told us I see is ok too and 
this is much more intuitive behavior.
However, only map functions are cached: reduces/shows/lists/updates and others 
are recompiling for each call, so all this import optimization tricks are not 
so useful as they have to be.
I suppose that also would be better to extend preimported packages with most 
popular and useful, which probably would be always imported, such as:
time, datetime, re, hashlib, math, random, itertools and others. But that would 
be very implicitly feature without reading of docs and not only I should decide 
what will be in this list.

No side effect is requirement for views only, afaik, because index is based on 
view result only, while shows/lists just the way to show data in nicer form. 
There is one more thing to keep views as much stable and independent from side 
effects as possible: if you have dozen millions documents last thing that you 
would like to do is to rebuild view index, because this would take hours. Yes, 
trick with secondary server and replacement view index is nice idea, but you 
still have to lose your hours and you'll have service down for a some time.
However, in 1.1.x branch was added feature to require view/lib stored module 
for map functions.

Suddenly for couchdb view servers, hoster wouldn't provide some space for your 
own modules because that would require some additional interface, monitor to 
reload module set in realtime without forced restart of view server and...this 
solution killing portable pythonic couchapps. Javascript couchapps are awesome 
because you just have to type: "couchapp push" and that's all - it works!

So, what resolution will be?

Original comment by kxepal on 24 Mar 2011 at 7:40

GoogleCodeExporter commented 9 years ago
Can you link some documentation on that 1.1.x feature? That's something I'd be 
*very* interested in learning about.

Show and list functions are supposed to be side effect free too. That way, they 
can be cached by couch (though I'm unsure if couch itself actually does that). 
I'm pretty sure couch does proper Etag handling of the show/list results, so if 
you're expecting that you can generate a new result each time someone accesses 
a doc via show, or a set of documents via list, know that couch will *tell* the 
client/browser to use what it already has if none of the pertinent documents in 
the database have changed.

Check out:
* http://guide.couchdb.org/draft/show.html#constraints
* http://guide.couchdb.org/draft/transforming.html#example (see the first 
"lightbulb" blockquote in that section).

Hah, you're right about the map caching. I forgot that some of the others don't 
cache! Hmmmmm. We could do our *own* caching. I'm not sure if that's considered 
bad behavior or not, but I don't see how it makes a difference, and really, I 
think the fact that they send the reduce function *every time* a reduce 
computation is needed is a bad choice in protocol design -- it's simpler, yes, 
but they could have just added 'load' and 'unload' commands for functions, so 
that you can do one-time compilation.

What we can do is cache the reduce/list/show functions they give us and run the 
computation. Next time they pass us a function, we do a string compare on the 
new code for that function to the string of code we originally received for 
that named list/show/reduce function. If it's the same as before, then our 
compilation step becomes a no-op, and we just use what we already had. If the 
function is different, then we assume that the design doc has been updated, and 
we recompile. This way, we can do things like use closures for those 
performance gains. Couch's own rules and reasons for side-effect free functions 
are what gives us the right to do this kind of caching.

Moving on... well random probably shouldn't be imported (or at least not used 
by any couch stuff), since by its very nature, it'll produce different results 
every time.

As for downtime, in many cases, couch can service requests while an index is 
being rebuilt. Also, you can easily replicate to another 
database/server/whatever (secondary server), rebuild it there, and temporarily 
make that the primary database your serving from while you rebuild the index on 
your real primary. That sounds complicated, but as we all know, in couch that 
takes less thought than it took me to write about it just now. Also, that's all 
assuming that couch's index hot-rebuild doesn't cover your use case. 
Hot-swapping couch databases and even couch server instances, or adding 
redundancies and failovers is a fact of life with couch -- sure, there are 
plenty of us running just one couch instance for a given application, but it's 
so painless to temporarily add another copy ad-hoc, and tear it back down when 
you don't need to again. Unlike with other systems, you don't even really need 
to plan ahead when you do it.

You've got a good point. I'm not sure what the resolution would be. Clearly 
python wins over javascript for couchdb not because of its pretty and concise 
syntax, since view/list/show functions are about the same size in either 
language if you aren't allowed to import anything. Python would win out because 
of its standard library (which is API-stable enough for couch), and because of 
its 3rd party modules.

In any case, behavior-changing module upgrades could only be handled by 
rebuilding the index. Even though the code that couch can see (the code stored 
in the design doc) hasn't changed, the code it links to has. So you simply have 
to treat it in same way as if you changed a line of code in the map func 
itself, and there's no way around that.

Just like with retooling your own view/list/show function code, you have to 
strike a balance between the time you need to spend rebuilding an index, and 
the benefits you get from changing the code. After all, you can always choose 
to *not* upgrade your python or module to a new version, and just because there 
is a new version, doesn't mean you need it.

Original comment by extempor...@gmail.com on 26 Mar 2011 at 6:04

GoogleCodeExporter commented 9 years ago
The only documentation I saw is the source code(: 
https://github.com/apache/couchdb/commit/7665e449cdfff1e660ed2bbac3de4507cb063a1
8#share/server/state.js
AFAIK, this command passed automatically if ddoc has views/lib/... path set, 
but I'm not sure. However, I could think in another way while looking on test 
case.

Caching shows/lists/other ddoc subcommands may be possible, but this cache 
would be reseted on each design document update. Reduce functions couldn't be 
cached without source code comparing. However, this trick wouldn't work with 
0.10.0.
There is command ["reset"] to clean up map functions cache and drop all you 
configuration: mime types, reduce_limit etc. However, again, it's system wide, 
not available from the outside.
I need some time for experiments to understand all profits and all flaws for 
such caching. If it hadn't been implemented for javascript view server, there 
must be some reason, right? First one I see, if you update 3d party package 
within system, your cached byte code wouldn't be updated too - design have not 
been changed! - and you'll have a lot of fun in this case(: It could be 
recompiled once again for such fail, but tests are still needed.

I have also found case that breaks idea with imports on top of function:
>>> import datetime
>>> from itertools import groupby
>>> def test(doc):
>>>  yield doc['_id'], 'passed'
the result namespace would be always:
{'datetime': <module 'datetime' from 
'/usr/lib/python2.4/lib-dynload/datetime.so'>,
 'groupby': <type 'itertools.groupby'>,
 'test': <function test at 0x7f3395044938>}
So, those function that iterator would found will be groupby. That's wrong one, 
but it returns two value tuple, but will generate very strange error:
>>> TypeError: <generator object at 0x7fcebfa58908> is not JSON serializable
Totaly crushing view server. You'll have to spend a lot of time with --debug 
option enabled to understand why, but currently it would not help you in such 
case without additional logging. And if generators was JSON serializable you've 
got even wrong result without any warnings. Still not very explicitly and 
relaxing behavior ):
Binding by names? Not an option.

Random module shouldn't be used for views for sure, but it could be useful for 
lists to randomization output.

Idea with swapping temp/production databases is nice too if temporary couch 
instance could serve for a while as production one...but I suppose this 
interesting disquisition not for this issue(;

In next things I'll agree with you - we have to find mostly ideal point of 
balance. Hard optimizations and tricks is part of highload environment. There 
could also be used pypy instead, other faster json module etc. Our task is to 
create tools that are works, works good, but also have some space for heavy 
optimization with some trade off.

Original comment by kxepal on 26 Mar 2011 at 8:09

GoogleCodeExporter commented 9 years ago
Yeah, I'll have to look into that 'require' thing. On first glance, it looks 
like couchjs is doing a request to the design doc for the dependencies.

Right, as said in a previous post, in order for the above-the-function option 
to work, without the use of a closure, you (the view server function compiler) 
would have to check every key in the locals dictionary that exec generates to 
see if it has a __module__ attribute, and if that attribute has the value of 
None. The only backwards compatible requirement we need is that there is only 
one callable object (usually a function) that has __module__ set to None (since 
non-imported local functions/classes will have __module__ of None).

>>> code = """
... from datetime import datetime
... 
... class OldStyleClass:
...     pass
... 
... class NewStyleClass(object):
...     pass
... 
... y = 17
... 
... def test():
...     return 5
... """
>>> locals = {}
>>> exec code in {}, locals
>>> locals
{'y': 17, 'test': <function test at 0x7f7e25f10320>, 'NewStyleClass': <class 
'NewStyleClass'>, 'OldStyleClass': <class __builtin__.OldStyleClass at 
0x7f7e25f196b0>, 'datetime': <type 'datetime.datetime'>}
>>> for key in locals:
...   if callable(locals[key]):
...     if locals[key].__module__:
...       print key, "is *not* a candidate, since it's imported from", 
locals[key].__module__
...     else:
...       print key, "is a candidate (hopefully the only one, or we'll have to 
error out)"
...   else:
...       print key, "isn't even callable, so we don't care about it"
... 
y isn't even callable, so we don't care about it
test is a candidate (hopefully the only one, or we'll have to error out)
NewStyleClass is *not* a candidate, since it's imported from __builtin__
OldStyleClass is *not* a candidate, since it's imported from __builtin__
datetime is *not* a candidate, since it's imported from datetime

Huh, so apparently class definitions inside of an exec will be associated with 
the __builtin__ module, so we'd have to check for that, as well. But in 
general, it's easy to do a backwards-compatible check for non-imported 
callables.

Oh, perhaps the answer to the module distribution problem is to put a custom 
import mechanism that checks for those modules as attachments to the 
view/list/show functions design doc before checking the normal on-disk module 
path. Couch's _changes API would need to be monitored for design doc changes by 
couchpy too, so that couchpy can know when it needs to reload modules. If this 
were achievable, you could bundle your modules in the design doc itself 
(regular zip files and eggs could be supported).

The best way to get a good system in place for this is not to work around 
Couch's API, but instead to work directly with the Apache Couch community to 
support everything we're talking about, since none of it violates the 
side-effect-free requirements of couch if dependency checking can be moved into 
couch. This wouldn't mean that couch would have to understand any programming 
language, but would be able to handle changes to certain special design doc 
keys. For example, couch could *hypothetically* do:

{
  "_id": "_design/app",
  "lib": {
    "calc": "def something_statistical(a,b,c,d): return (a,b,c,d)",
    "image": "#some-package v1.3.2",
    "chart": "#other-package v4.1.7",
  },
  "depends": {
    "views.test": ["lib/calc"],
    "shows.graph": ["lib/chart"],
    "lib.chart": ["lib/image", "_attachments/something_local.egg"]
  },
  "views": {
    "test": {
      "map": "def fun(doc): yield doc['_id'], calc.something_statistical(*[doc.get(k) for k in 'abcd'])"
    }
  }
}

Once again, this doesn't exist in couch, but if it were implemented, couch 
would only need to know how to interpret the "depends" key. If a string in 
"lib" changes (couch doesn't need to know or care what the contents of that 
string mean), then everything that depends on it needs to get updated, just 
like it reindexes views when the view function strings are changed. In the case 
of list or shows, this would mean setting a new Etag that invalidates 
client-cached versions of the previous show/list results. Couch would also need 
to send the dependency to the view server when it's needed, in the form of some 
kind of addlib command. couchpy itself could ignore the # version stubs, since 
those would just be there to provide an easy upgrade path for libraries. Or it 
could compare the version shown there to the version of the module it imports, 
and update the design doc if a new version is found on the module path. 
Dependencies starting with "_attachments" could be handled specially by couch.

Original comment by extempor...@gmail.com on 26 Mar 2011 at 11:26

GoogleCodeExporter commented 9 years ago
By the way, you're right that use of random is side-effect-free. Just keep in 
mind that couch's Etag/caching semantics will make it so that, if an HTTP 
client does proper caching, it'll do a conditional GET request for the 
show/list the next time you ask for it, and unless one of the documents the 
list/show depends on has changed, couch will tell that client that the resource 
has not been updated.

Therefore, your random lists will only look random once per depended-upon 
document update. This is on a client-by-client basis, though. If you have your 
own caching proxy in the middle, or something like couchbase starts having its 
own response cache, then everybody will see the same random results on each 
request, until the next time one of the pertinent documents is updated. This is 
another "good thing" that couch provides, because even though it might hurt you 
5% of the time, it really helps with scalability and responsiveness 95% of the 
time.

Original comment by extempor...@gmail.com on 26 Mar 2011 at 11:33

GoogleCodeExporter commented 9 years ago
> Yeah, I'll have to look into that 'require' thing. On first glance, it 
> looks like couchjs is doing a request to the design doc for the dependencies.
It doesn't but it have access to it via closure. It just have passed to compile 
function as second argument.

Your example could pass and work as "expected", but it just a case. There are
others that wouldn't worked as "expected". There is needed just stable entranse 
point.
May be some kind of decorator would be solution as:
>>> import datetime
>>> def helper(item):
>>>     ...
>>> @main
>>> def mapfun(doc):
>>>     ...
But would it be good, explicitly and clean? Looks like the same as predefined 
function with special name. I suppose there is no so much need in complex code 
block. One node - one function. Libs will take others with exported statements 
as they have been designed to do + eggs as libs to store more complex packages.

> Couch's _changes API would need to be monitored for design doc changes by 
> couchpy too, so that couchpy can know when it needs to reload modules. 
> If this were achievable, you could bundle your modules in the design doc 
> itself (regular zip files and eggs could be supported).
It doesn't needs as if design document have been changed there would be passed 
command to refresh it within local cache.

Also note, that attachments is separate entity that just binded to document, 
but doesn't pass with it. So to call attachment from show/list you have to make 
pure http request - madness!(:

> For example, couch could *hypothetically* do: ...
Too complex solution: instead of just create function you have to create it + 
set up all required dependences to make to work correct. Same thing does 
require function currently - just invoke it and extract needed exported 
statement.

> By the way, you're right that use of random is side-effect-free. Just keep 
> in mind that couch's Etag/caching semantics will make it so that, if an HTTP
> client does proper caching, it'll do a conditional GET request for the 
> show/list the next time you ask for it, and unless one of the documents the 
> list/show depends on has changed, couch will tell that client that the 
> resource has not been updated.
In show/list function you could set your own headers and disable caching via 
Expires header. It has higher priority than Etag one. Actually, Etag only 
__may__ be used for cache proposes.

Original comment by kxepal on 26 Mar 2011 at 1:08

GoogleCodeExporter commented 9 years ago
Good idea! As you indicated, if you have a single callable, then it'll work as 
expected. If you have more than one callable, you must designate it with @main.

Well the main point is, that your idea provides a mechanism for the programmer 
to be as expressive an succinct as they need.

...
def helper(item):
...
@main
def mapfun(doc):
...

is the equivalent of:

def mapfun():
   ...
   def helper(item):
   ...
   def mapfun(doc):
   ...
   return mapfun
mapfun = mapfun()

Only difference is that the decorator is a *lot* easier.

The point about my dependency solution is that it lets couch handle recursive 
dependencies with respect to index rebuilding and Etag handling, so that couch 
can make sure that all data is consistent. I agree, it's complex, and there's 
bound to be a better way (I don't care for my solution either -- it's an 
initial suggestion). I just know that if we have to manage recursive (or even 
non-recursive) dependencies ourselves, then it won't work. Sooner or later, 
we'd end up with a badly inconsistent database, with bugs that are really hard 
to notice.

Dependencies are necessary because in the typical couch application (at least 
all of the ones I've done), there is a lot of duplicate code, and that 
duplicate code makes the application much much harder to maintain.

Do you really want to override Etag handling as done by couch. Put it this way, 
Etag is the absolute best caching mechanism available to you, but it's also 
*very* complex to get it right. Enterprise-grade server software often fails to 
handle it usefully (Apache uses inode numbers, which does not allow you to 
cluster while still keeping out-of-the-box caching), and many high-end websites 
with big budgets never manage to implement it, instead using expires headers, 
or spending money on a secondary server to deal with application inefficiencies.

Couch manages Etags perfectly, so even though it's easy to add nodes to scale 
couch, Etags work in the opposite direction, making it so you have much less of 
a need to scale. If you have something that's completely dynamic, and there's 
nothing in couch's architecture that tells you that you must use idempotent 
show/list functions, then by all means, send a no-cache header. But if you have 
something that is barely dynamic (like you include a random hash with the 
output, just for the heck of it, or you want to add a string saying 'response 
generated in 0.0013 seconds'), then you probably want to rethink what you're 
trying to do, since you're sacrificing a lot to gain so little.

Original comment by extempor...@gmail.com on 26 Mar 2011 at 2:08

GoogleCodeExporter commented 9 years ago
> Only difference is that the decorator is a *lot* easier.
Easier? May be. Implicitly? For sure. You have always keep in mind this @main 
decorator. However, I see we've come to current, original state - single 
function which creates inner context (;

> Do you really want to override Etag handling as done by couch. 
I don't mean to override it, but I've answer to you how to workaround cache 
case based on Etag.
About Etag: 
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.19
a little below you may found Expires header.

Original comment by kxepal on 26 Mar 2011 at 2:27

GoogleCodeExporter commented 9 years ago
At first, sorry for those "wall of text" that subscribers had received from 
us - probably we have to create separate topic on groups, but now it's too 
late. And special sorry to those one, who unstarred this issue - he wouldn't 
receive notification about new version of view server that I would like to 
attach for testing.

Short changelog:

! remove dependency from versioning decorator

! fix Mime class and show functions with provides/response_with methods - they 
was just totaly broken.

! fix Python exception encoding for CouchDB versions < 0.11.0

! correct filters for versions >= 0.11.1. There is no more userctx argument, 
beware!

+ add missed ddoc cache (thanks for this discussion)

+ add support for add_lib command. CouchDB version >= 1.1.x required

+ add support for views command. Currently available for trunk version

+ add support for secobj for validate_doc_update commands. Requires CouchDB >= 
0.11.1. However, this argument leaved as optional due to it doesn't mentioned 
in most examples.

+ add require function with same behavior as same javascript function has

+ add docstrings for most valuable methods with descriptions and examples

~ add support for 0.8.0 version - that was too easy (:
~ allow imports in design functions (see notes below)

~ _log function has been replaced by logging handler

~ correct error message for design function wrong definition

~ code cleanup, reorganisation, formatting fixes

~ more tests added and passed (47 total, 5 failed for 0.9.0 due to I couldn't 
reproduce valid behavior - have someone windows binaries of 0.9.x?)

Something about imports:

http://mail.python.org/pipermail/python-list/2007-September/507450.html

I really hadn't knew about this behavior(: So any imports at top level are 
useless if only they are not be explicitly passed to target function as 
arguments or through decorator. However, I've allow usage of them due to 
perfomance reasons.

More detaled history of changes avaiable in viewserver branch:
https://code.google.com/r/kxepal-couchdb-python-featured/source/list?r=viewserve
r

Next questions that I have:
1. Should I split view.py into view package(propbably better name it viewserver 
package) due to a code growing and missing support of sphinx autodocumentation?
2. Should I add preimported modules? I've stoped at next ones: base64, 
calendar, datetime, math, random, re, time - they are quite common, useful and 
avaiable in all supported versions.
3. Should I add eggs support via --egg-cache parameter where storage folder 
would be specified? Eggs could be stored as base64 encoded strings, not as 
attachments due to they are not avaiable from view server.

Original comment by kxepal on 2 Apr 2011 at 6:00

Attachments:

GoogleCodeExporter commented 9 years ago
Ok, I'll answer on those questions by myself(:

> Should I split view.py into view package(propbably better name it viewserver 
package) due to a code growing and missing support of sphinx autodocumentation?
Yes, I should. Because operate with 2K of very nested codebase with massive 
cross functions dependencies is not easy and missing sphinx autodoc feature 
makes to be sad.

> Should I add preimported modules? I've stoped at next ones: base64, calendar, 
datetime, math, random, re, time - they are quite common, useful and available 
in all supported versions.
No, I shouldn't. Because I couldn't decide the developer needs for current 
project, even if those modules are all fits to most tasks. Instead of that, 
I've create something like QueryServer constructor, which could be used to
create your own QueryServer with your own behavior without couchdb-python code 
patching. Petty nice solution, right?(; See `construct_server` function in 
`couchdb.server.__init__.py` for how the default query server is defined.

> Should I add eggs support via --egg-cache parameter where storage folder 
would be specified? Eggs could be stored as base64 encoded strings, not as 
attachments due to they are not available from view server.
Yes, I should. Because this feature provides too much to leave it ignored. 
However, it's optional and must be enabled explicitly for security and 
compatibility reasons. To store eggs within design documents you should encode 
egg as base64 string. See documentation for examples.

So, query server was totally refactored from single module to full package and 
here is new version changes:
+ add support eggs as modules.
+ add option to control GET request to update functions.
+ add query server constructor: define your own context, error handlers, 
commands(if you've own CouchDB fork or living with very nightly builds) and 
more.
+ add query server documentation article.
+ add own logging channel for each part of query server.
~ update "Writing views in Python" documentation article.
~ fix doc strings to make them more sphinx friendly.
~ fix for require circular references (COUCHDB-1075)
- remove debug decorator, because now you may implement it by your own if you'd 
like.

Tested on Python 2.4-2.7 and PyPy 1.5. All changes are still available at 
http://code.google.com/r/kxepal-couchdb-python-featured/source/list?r=viewserver

And that's all I think(: Could someone review documentation articles due to my 
poor english knowledge and code to decide is there something needed to change? 
Any ideas? Criticism? Thanks(:

Original comment by kxepal on 9 May 2011 at 4:25

Attachments:

GoogleCodeExporter commented 9 years ago
tested on android 2.3.4 Google Nexus One using Py4A application. To share my 
happiness do next things:
1. copy couchdb package folder to 
/sdcard/com.googlecode.pythonforandroid/extras/python (query server imports are 
absolute and uses couchdb package as root)
2. create file on sdcard, for example /sdcard/couchpy, and place next code into 
it:
PYTHONPATH=/data/data/com.googlecode.pythonforandroid/files/python/lib/python2.6
/lib-dynload
PYTHONPATH=${PYTHONPATH}:/mnt/sdcard/com.googlecode.pythonforandroid/extras/pyth
on
export PYTHONPATH
export PYTHONHOME=/data/data/com.googlecode.pythonforandroid/files/python
export 
LD_LIBRARY_PATH=/data/data/com.googlecode.pythonforandroid/files/python/lib
/data/data/com.googlecode.pythonforandroid/files/python/bin/python 
/mnt/sdcard/com.googlecode.pythonforandroid/extras/python/couchdb/view.py 
--couchdb-version=1.0.0
3. add next line to query_servers section in CouchDB configuration:
python = sh -e /sdcard/couchpy
4. ...
5. now you could use pythonic design documents on android(:

Original comment by kxepal on 22 May 2011 at 4:38

GoogleCodeExporter commented 9 years ago
It's good thing to review your own code after some time passed. This update 
includes a lot of fixes and even some new features:

global:
- removed global state and cross module references (WOO-HOO!)
- rewritten QueryServer api
- added SimpleQueryServer as high level abstraction on top of QS internals
- added MockQueryServer to help write unittests
- fix docstring and typos.
- query server logs are more useful now in debug mode
- update documentation with android paragraph and how to customize query server
- place TODO references to actual CouchDB issues: COUCHDB-729, COUCHDB-282, 
COUCHDB-1261, COUCHDB-898. I could fix them locally, but this will make more 
differences between original JS server and Python one.
- add more over 170 test cases

compiler:
- fix crush for compilation of source code with windows newlines
- fix double crush if function compilation failed
- fix crush for malformed base64 encoded egg
- fix crush on egg cache usage
- code refactoring

stream:
- abstraction from JSON module exception type on decode/encode operations

render:
- fix COUCHDB-1272
- code refactoring

validate:
- prevent query server crush by validate_doc_update on Python exceptions

views:
- reduce_output_overflow error now will be raised properly
- small refactoring
- document seal now works better with copy.deepcopy()

design functions:
- send(), start(), provides(), register_type() available only for show and list 
functions
- get_row() available only for list functions
- log() function is not proxy of logging.info anymore

All test passed for:
- Python 2.4 to 2.7
- PyPy 1.5 and 1.6 
- Android 2.3.4 with Python for Android version 5 against CouchDB-1.0, 
andorid-0.1 and MobileFuton 1.7

Please, could someone review docstrings and sphinx docs? I'm sure documentation 
text is far from good state /:

Original comment by kxepal on 13 Sep 2011 at 10:44

Attachments:

GoogleCodeExporter commented 9 years ago
I took a look at this, but I'm having some trouble getting the tests running. 
In particular, this bit doesn't seem to work, independent of the view server 
used:

djc@enrai couchdb-python $ python
Python 2.7.2 (default, Oct 24 2011, 10:16:20) 
[GCC 4.5.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> pipe = subprocess.Popen(['/usr/bin/python2.7', 'couchdb/view.py'], 
shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, 
stderr=subprocess.STDOUT)
>>> pipe.stdin.write('["reset"]\n')
>>> pipe.stdout.readline()
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyboardInterrupt
>>> 

Meanwhile, this works just fine from the command-line:

djc@enrai couchdb-python $ python couchdb/view.py
["reset"]
true

I looked at the queryserver.zip from comment 25 and the files from comment 6. 
It seems to me that the former is too large and complex to take into 
couchdb-python at this time. The stuff from comment 6 is much more simple, but 
I couldn't run the test suite due to the above issue.

Finally, the code in comment 6 does a bunch of stuff to stay compatible with 
all of 0.9, 0.10 and 0.11+. I would propose that any new view server code we 
take for our next release be limited to supporting 0.11+-compatible; that's 
already quite old at this point.

Original comment by djc.ochtman on 12 Feb 2012 at 11:49

GoogleCodeExporter commented 9 years ago
Hi, Dirkjan!

Thanks for first review(: Actually, I never run it as subprocess, but if you 
take a look at couchdb/tests/testutil.py::QueryServer so it could be run as 
subprocess, just remove shell=True from Popen.

There is a huge difference between comment 6 and comment 25. It's not in code 
size, it's in bugs, code complexity, documentation, tests, features, logging 
and how easily you could extend it without getting things broken. Support of 
old CouchDB releases is not a little part of it, just a few functions that 
easily could be removed. For example, I've easily added multiprocessing support 
for qs#25 for map/reduce functions just by decorating server/views.py functions 
without touching source code.

Also, you have to change a look from comment 6 to comment 22. That was the last 
version of all-in-one-file queryserver, but it still buggy by design.

Main goal of qs#25 and all code splitting was to simplify future support, allow 
to extend it easy and help with couchapps unittesting because now you could run 
it not as subprocess. And it had been reached.

You may read changelogs in this thread and in my clone at viewserver branch, 
they are quite full.

I admit, that it's a little big patch for about 250KB of code(removing 
docstrings could reduce it by half I sure), but I'd like to take support of it, 
because I use it for everyday tasks, I knew each line of it, I'd like to help 
couchdb-python project and I do not want to create 
yet-another-python-queryserver-project. Peoples knows about couchdb-python, 
knows about his viewserver and expecting that it's fine. Why not to satisfy 
them?

Original comment by kxepal on 12 Feb 2012 at 12:40

GoogleCodeExporter commented 9 years ago
Updated Python query server in attachments. After almost year usage in 
production there was fixed some small problems:
- Eventually crush on chunk encoding for _list functions.
- View lib cleanup on reset command
- Handle single named MIME type params e.g. application/pdf;base64
- Reduced useless logging output to improve they readability.
- Fix COUCHDB-1330.
- Fix crush on malformed MIME type.
- Couchapp modules no more needed to be wrapped into some scope: just write 
regular 
Python code for them. For example:
{{{
  import datetime

  def foo(datetime=datetime):
      return datetime.datetime.utcnow().replace(microsecond=0).isoformat('T')

  exports['foo'] = foo
}}}
Now you may remove any proxy hacks to have simply and expected code behavior:
{{{
  import datetime

  def foo():
      return datetime.datetime.utcnow().replace(microsecond=0).isoformat('T')

  exports['foo'] = foo
}}}

This change doesn't affects on other ddoc functions: show/lists/views etc.

Original comment by kxepal on 3 Aug 2012 at 4:54

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by djc.ochtman on 21 Sep 2012 at 8:33

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub. Please continue discussion here:

https://github.com/djc/couchdb-python/issues/146

Original comment by djc.ochtman on 15 Jul 2014 at 7:19