Open GoogleCodeExporter opened 9 years ago
Did you write this yourself? Is it a port of the JavaScript view server, or did
you find some documentation on the view protocol?
I definitely want an updated view server, but I prefer that we only add
something for which we have good tests, so that we can prevent it from going
stale again. Would you be in a position to write tests for this? Some design
notes on the protocol would likely also be helpful.
If we decide to take this, I also have some code/style nits.
Original comment by djc.ochtman
on 16 Aug 2010 at 2:37
It's javascript view server ported by me. I've going by only one thing - to
make view server maximum compatible with js one, because it's always
up-to-dated and have released first. So as close pythonic view server will be
to it so easy support will be.
Shame on me, I've thought for some unknown reason that view server must be
tested via couchdb-python API - just have look at tests/view.py. I'll make them
for tomorrow.
Original comment by kxepal
on 16 Aug 2010 at 2:59
I've found a little problems due passing official view server tests, but now
all fine.
I'll make better exceptions handling + crush tests later - I really dont like
this forest of try..except, but couldnt invent something better now - need some
time to think about it. So this is just check point(:
In additional to javascript view-server features I've implemented two behaviors:
- sealed document: changing document in map function makes no sense for other
map functions within single view.
- any pythonic exception will crush view server while javascript view server
allow crushing only on fatal errors - I dont know which ones python have.
Everything else seems same.
Original comment by kxepal
on 19 Aug 2010 at 1:38
oops, forgot to remove debug handler from view server.
Original comment by kxepal
on 19 Aug 2010 at 1:42
Ok, here is updated view server. Major changes:
- support any couchdb server since 0.9 version. By default view server works in
mode of compatibility with latest couchdb server version, just run it with
--couchdb-version key, e.g. view.py --couchdb-version=0.10, to make it work
well for 0.10 couchdb server.
- Tests are included for each supported version - all of them had been ported
from javascript view server.
- Assertion error within validate_doc_update function doesn't count as Fatal
like any other pythonic exception and will be wrapped as Forbidden error
- "Reduce output must shirnk more rapidly" error now may be occured
- More verbose debug logging
Also I have to add function versionizing decorator to split their behavior for
each couchdb version. I thinks it will be useful in future to keep legacy
support without serious code rewriting.
Original comment by kxepal
on 3 Sep 2010 at 11:31
Changes:
- fixed compatibility for python-2.4+
- added more verbose debug output
I've removed old attachments, because they are not actual for now.
Original comment by kxepal
on 13 Sep 2010 at 12:09
Attachments:
I'm going to need to find a solid chunk of time to review this, which might
take a while, but I'd really like to take this for the next release...
Original comment by djc.ochtman
on 19 Sep 2010 at 1:16
Issue 140 has been merged into this issue.
Original comment by djc.ochtman
on 22 Dec 2010 at 9:13
I've fixed issue #163 in this view server
http://code.google.com/r/kxepal-couchdb-python-featured/source/detail?r=e1448890
d2223e0321f96459a53c1757dc5b9662 (just have not seen any reasons to attach once
again all files for several small changes)
In over way it's ready for 1.0.2 since main change in js view server was in
sealing documents for map func, but this feature is done already.
I will port sofa and tapirwiki for this view server within next two weeks, so
this made a great challenge for him and may be more fixed will come if I've
missed something.
Original comment by kxepal
on 12 Feb 2011 at 7:08
I've found one thing missed - require function to have some modular application
within document design. But there are some questions about it:
- should it work as in javascript view-server: wrap some abstract code and
return export values? Would be better to implement python-like import?
- should it support eggs? I think it should, but I have no idea how to import
eggs inline without saving them on disk. This could be a problem for
application hosters.
any other behavior suggested?
Original comment by kxepal
on 22 Mar 2011 at 11:06
What I'm doing to handle some efficiency issues with import:
def fun():
import datetime
... other imports ...
def fun(doc):
...
delta = (datetime.strptime(doc[...]) - datetime.strptime(doc[...])).days
...
return fun
fun = fun()
What really ought to happen is that the view server should go through each
variable in the exec's locals and check the __module__. If __module__ exists,
but is None, and the variable points to a callable, then use that as the
map/reduce func, and error out if more than one is found. This would be
backwards compatible with existing view functions, but would make it so
closures are not necessary.
I think the eggs deployment issue has been tackled many times before. Importing
eggs inline would obliterate responsiveness. Does couch time you out if you
take to long? I'd be concerned that it would/should. If an application hoster
supports python, then sooner or later they'd need to come up with a solution to
handle 3rd party software since, frankly, the power of using python as a view
server doesn't just lie in "it looks nice" and "it has yield".
To meet couch's same-code-same-result requirements (no side effects), we would
maybe have to mark imports with python and module version strings, and push
that back into couch. For example, 'import mythirdpartyegg' would then append
'#mythirdpartegg py cpython-2.6.5 mod 3.11r7112' to the end of the eval string,
one per unique detected module. Any time you do any module upgrades, you can
just delete the version-marking comments out of the view func manually, and
couch will regenerate it.
However, most modules *tend* to be stable enough API-wise that this isn't a
problem. If there were any behavior altering bugs/changes to the code, an
administrator could achieve this manually.
Original comment by extempor...@gmail.com
on 22 Mar 2011 at 3:30
Several questions, could I?
1) what the reason of such wrapper against:
def fun(doc):
...
delta = (datetime.strptime(doc[...]) - datetime.strptime(doc[...])).days
...
import datetime
this is not very pythonic to place imports below, but datetime will be tried to
import only once. However, this style could produce another problem: views must
have not any state and any dependence from source which could be changed later.
2) Hoster could provide 3rd party modules, but it couldn't provide all versions
of each template engine, for example. May be you needs trunk jinja2 with you
own patches, who knows? So idea to create fully portable pythonic couchapp will
be failed.
3) Is preprocessors statements really good idea? I saw them in couchapp, but
they have been used only for declaration, not within document design nor view
server.
As intermediate result, this is implementation of require function as is it
works for javascript view server:
http://code.google.com/r/kxepal-couchdb-python-featured/source/detail?r=0b6625db
473e74a83df7e9a339899a3c318f7b80
I still need to finish some details, so attachments with new version of view
server will be later. Sorry, Dirkjan, it seems you to have revise it once
again, but I'll include documentation for each vital function and more tests to
make process more easy just(:
Original comment by kxepal
on 23 Mar 2011 at 8:45
You *could* put imports after the inner func that needs them, and python's
scoping rules would resolve variable lookups, but I agree, it's not Pythonic.
Closures themselves are not very Pythonic, either. Anyways, why not put imports
first, as per my suggestion?
The whole reason for using a closure is to avoid the performance penalty of
repeating the imports for each document. If a function needed k imports and
there are n documents in a couch database, then, compared to the closure
technique shown above, there'd be n*k redundant imports taking place, which is
very slow (python doesn't re-import the module, but there is overhead involved,
which can be significant).
See:
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhea
d
I disagree with you on the severity of Couch's "no side-effects" requirement
for view/show/list functions. It's a matter of practicality, not a matter of
theory. Yes, Couch says that the same document passed to the same code must
produce the same output, no matter how many times it's executed. If the
document doesn't change, then the result doesn't change.
However, this mandate is only for data correctness. If the module you're
importing in a view function gets upgraded, and its behavior changes, but your
view function stays the same (so couch doesn't regenerate the view), then all
that happens is that your view will be incorrect. Couch won't break (couch
won't know, mind, or even care).
Also, using module imports don't count as "side effects" at all. Aside from the
random module, almost all modules (including 3rd party ones) are stateless in
their behavior. For example, it'd be fine to describe a shape as a list of
points in a couch document, and then use PIL to draw that to a png image in a
show function, or to use couch to store server access logs, and then use
pychart from a list function to generate an svg rendered line graph of server
traffic.
Furthermore, the above "same code, same doc, same result" requirement does not
apply across all time. For example, I could define a view function that
imported some module, and then change the behavior of the module. All I'd have
to do fix the consistency of the view would be to add a single space to the end
of the view function, save the design document to couch, and then remove that
space, and save the design document again. The code is *exactly* the same as
before, yet we side-stepped the upgrade-changes-behavior issue, and caused
couch to regenerate its views to reflect the new behavior. And this process
can be automated (alternately, you could delete the views from the design doc,
do a view cleanup, and then reupload the original design document, which would
require only one view regeneration instead of two). As long as you regenerate
your views when needed, the "could be changed later" issue isn't a problem at
all.
In general, hosting vendors are not ever going to support all versions of all
potential packages. They're either going to support only a handful of popular
modules (probably Django templating) and force long release cycles, or they'll
provide you with a few megabytes of space to upload your own modules in your
own private module path, or they'll not allow the use of any kind of 3rd party
modules (in which case you might as well use Javascript).
Original comment by extempor...@gmail.com
on 24 Mar 2011 at 6:27
At first till I don't forget, thanks you for detail reply(:
So about imports:
Allow to have them on top of design function as PEP told us I see is ok too and
this is much more intuitive behavior.
However, only map functions are cached: reduces/shows/lists/updates and others
are recompiling for each call, so all this import optimization tricks are not
so useful as they have to be.
I suppose that also would be better to extend preimported packages with most
popular and useful, which probably would be always imported, such as:
time, datetime, re, hashlib, math, random, itertools and others. But that would
be very implicitly feature without reading of docs and not only I should decide
what will be in this list.
No side effect is requirement for views only, afaik, because index is based on
view result only, while shows/lists just the way to show data in nicer form.
There is one more thing to keep views as much stable and independent from side
effects as possible: if you have dozen millions documents last thing that you
would like to do is to rebuild view index, because this would take hours. Yes,
trick with secondary server and replacement view index is nice idea, but you
still have to lose your hours and you'll have service down for a some time.
However, in 1.1.x branch was added feature to require view/lib stored module
for map functions.
Suddenly for couchdb view servers, hoster wouldn't provide some space for your
own modules because that would require some additional interface, monitor to
reload module set in realtime without forced restart of view server and...this
solution killing portable pythonic couchapps. Javascript couchapps are awesome
because you just have to type: "couchapp push" and that's all - it works!
So, what resolution will be?
Original comment by kxepal
on 24 Mar 2011 at 7:40
Can you link some documentation on that 1.1.x feature? That's something I'd be
*very* interested in learning about.
Show and list functions are supposed to be side effect free too. That way, they
can be cached by couch (though I'm unsure if couch itself actually does that).
I'm pretty sure couch does proper Etag handling of the show/list results, so if
you're expecting that you can generate a new result each time someone accesses
a doc via show, or a set of documents via list, know that couch will *tell* the
client/browser to use what it already has if none of the pertinent documents in
the database have changed.
Check out:
* http://guide.couchdb.org/draft/show.html#constraints
* http://guide.couchdb.org/draft/transforming.html#example (see the first
"lightbulb" blockquote in that section).
Hah, you're right about the map caching. I forgot that some of the others don't
cache! Hmmmmm. We could do our *own* caching. I'm not sure if that's considered
bad behavior or not, but I don't see how it makes a difference, and really, I
think the fact that they send the reduce function *every time* a reduce
computation is needed is a bad choice in protocol design -- it's simpler, yes,
but they could have just added 'load' and 'unload' commands for functions, so
that you can do one-time compilation.
What we can do is cache the reduce/list/show functions they give us and run the
computation. Next time they pass us a function, we do a string compare on the
new code for that function to the string of code we originally received for
that named list/show/reduce function. If it's the same as before, then our
compilation step becomes a no-op, and we just use what we already had. If the
function is different, then we assume that the design doc has been updated, and
we recompile. This way, we can do things like use closures for those
performance gains. Couch's own rules and reasons for side-effect free functions
are what gives us the right to do this kind of caching.
Moving on... well random probably shouldn't be imported (or at least not used
by any couch stuff), since by its very nature, it'll produce different results
every time.
As for downtime, in many cases, couch can service requests while an index is
being rebuilt. Also, you can easily replicate to another
database/server/whatever (secondary server), rebuild it there, and temporarily
make that the primary database your serving from while you rebuild the index on
your real primary. That sounds complicated, but as we all know, in couch that
takes less thought than it took me to write about it just now. Also, that's all
assuming that couch's index hot-rebuild doesn't cover your use case.
Hot-swapping couch databases and even couch server instances, or adding
redundancies and failovers is a fact of life with couch -- sure, there are
plenty of us running just one couch instance for a given application, but it's
so painless to temporarily add another copy ad-hoc, and tear it back down when
you don't need to again. Unlike with other systems, you don't even really need
to plan ahead when you do it.
You've got a good point. I'm not sure what the resolution would be. Clearly
python wins over javascript for couchdb not because of its pretty and concise
syntax, since view/list/show functions are about the same size in either
language if you aren't allowed to import anything. Python would win out because
of its standard library (which is API-stable enough for couch), and because of
its 3rd party modules.
In any case, behavior-changing module upgrades could only be handled by
rebuilding the index. Even though the code that couch can see (the code stored
in the design doc) hasn't changed, the code it links to has. So you simply have
to treat it in same way as if you changed a line of code in the map func
itself, and there's no way around that.
Just like with retooling your own view/list/show function code, you have to
strike a balance between the time you need to spend rebuilding an index, and
the benefits you get from changing the code. After all, you can always choose
to *not* upgrade your python or module to a new version, and just because there
is a new version, doesn't mean you need it.
Original comment by extempor...@gmail.com
on 26 Mar 2011 at 6:04
The only documentation I saw is the source code(:
https://github.com/apache/couchdb/commit/7665e449cdfff1e660ed2bbac3de4507cb063a1
8#share/server/state.js
AFAIK, this command passed automatically if ddoc has views/lib/... path set,
but I'm not sure. However, I could think in another way while looking on test
case.
Caching shows/lists/other ddoc subcommands may be possible, but this cache
would be reseted on each design document update. Reduce functions couldn't be
cached without source code comparing. However, this trick wouldn't work with
0.10.0.
There is command ["reset"] to clean up map functions cache and drop all you
configuration: mime types, reduce_limit etc. However, again, it's system wide,
not available from the outside.
I need some time for experiments to understand all profits and all flaws for
such caching. If it hadn't been implemented for javascript view server, there
must be some reason, right? First one I see, if you update 3d party package
within system, your cached byte code wouldn't be updated too - design have not
been changed! - and you'll have a lot of fun in this case(: It could be
recompiled once again for such fail, but tests are still needed.
I have also found case that breaks idea with imports on top of function:
>>> import datetime
>>> from itertools import groupby
>>> def test(doc):
>>> yield doc['_id'], 'passed'
the result namespace would be always:
{'datetime': <module 'datetime' from
'/usr/lib/python2.4/lib-dynload/datetime.so'>,
'groupby': <type 'itertools.groupby'>,
'test': <function test at 0x7f3395044938>}
So, those function that iterator would found will be groupby. That's wrong one,
but it returns two value tuple, but will generate very strange error:
>>> TypeError: <generator object at 0x7fcebfa58908> is not JSON serializable
Totaly crushing view server. You'll have to spend a lot of time with --debug
option enabled to understand why, but currently it would not help you in such
case without additional logging. And if generators was JSON serializable you've
got even wrong result without any warnings. Still not very explicitly and
relaxing behavior ):
Binding by names? Not an option.
Random module shouldn't be used for views for sure, but it could be useful for
lists to randomization output.
Idea with swapping temp/production databases is nice too if temporary couch
instance could serve for a while as production one...but I suppose this
interesting disquisition not for this issue(;
In next things I'll agree with you - we have to find mostly ideal point of
balance. Hard optimizations and tricks is part of highload environment. There
could also be used pypy instead, other faster json module etc. Our task is to
create tools that are works, works good, but also have some space for heavy
optimization with some trade off.
Original comment by kxepal
on 26 Mar 2011 at 8:09
Yeah, I'll have to look into that 'require' thing. On first glance, it looks
like couchjs is doing a request to the design doc for the dependencies.
Right, as said in a previous post, in order for the above-the-function option
to work, without the use of a closure, you (the view server function compiler)
would have to check every key in the locals dictionary that exec generates to
see if it has a __module__ attribute, and if that attribute has the value of
None. The only backwards compatible requirement we need is that there is only
one callable object (usually a function) that has __module__ set to None (since
non-imported local functions/classes will have __module__ of None).
>>> code = """
... from datetime import datetime
...
... class OldStyleClass:
... pass
...
... class NewStyleClass(object):
... pass
...
... y = 17
...
... def test():
... return 5
... """
>>> locals = {}
>>> exec code in {}, locals
>>> locals
{'y': 17, 'test': <function test at 0x7f7e25f10320>, 'NewStyleClass': <class
'NewStyleClass'>, 'OldStyleClass': <class __builtin__.OldStyleClass at
0x7f7e25f196b0>, 'datetime': <type 'datetime.datetime'>}
>>> for key in locals:
... if callable(locals[key]):
... if locals[key].__module__:
... print key, "is *not* a candidate, since it's imported from",
locals[key].__module__
... else:
... print key, "is a candidate (hopefully the only one, or we'll have to
error out)"
... else:
... print key, "isn't even callable, so we don't care about it"
...
y isn't even callable, so we don't care about it
test is a candidate (hopefully the only one, or we'll have to error out)
NewStyleClass is *not* a candidate, since it's imported from __builtin__
OldStyleClass is *not* a candidate, since it's imported from __builtin__
datetime is *not* a candidate, since it's imported from datetime
Huh, so apparently class definitions inside of an exec will be associated with
the __builtin__ module, so we'd have to check for that, as well. But in
general, it's easy to do a backwards-compatible check for non-imported
callables.
Oh, perhaps the answer to the module distribution problem is to put a custom
import mechanism that checks for those modules as attachments to the
view/list/show functions design doc before checking the normal on-disk module
path. Couch's _changes API would need to be monitored for design doc changes by
couchpy too, so that couchpy can know when it needs to reload modules. If this
were achievable, you could bundle your modules in the design doc itself
(regular zip files and eggs could be supported).
The best way to get a good system in place for this is not to work around
Couch's API, but instead to work directly with the Apache Couch community to
support everything we're talking about, since none of it violates the
side-effect-free requirements of couch if dependency checking can be moved into
couch. This wouldn't mean that couch would have to understand any programming
language, but would be able to handle changes to certain special design doc
keys. For example, couch could *hypothetically* do:
{
"_id": "_design/app",
"lib": {
"calc": "def something_statistical(a,b,c,d): return (a,b,c,d)",
"image": "#some-package v1.3.2",
"chart": "#other-package v4.1.7",
},
"depends": {
"views.test": ["lib/calc"],
"shows.graph": ["lib/chart"],
"lib.chart": ["lib/image", "_attachments/something_local.egg"]
},
"views": {
"test": {
"map": "def fun(doc): yield doc['_id'], calc.something_statistical(*[doc.get(k) for k in 'abcd'])"
}
}
}
Once again, this doesn't exist in couch, but if it were implemented, couch
would only need to know how to interpret the "depends" key. If a string in
"lib" changes (couch doesn't need to know or care what the contents of that
string mean), then everything that depends on it needs to get updated, just
like it reindexes views when the view function strings are changed. In the case
of list or shows, this would mean setting a new Etag that invalidates
client-cached versions of the previous show/list results. Couch would also need
to send the dependency to the view server when it's needed, in the form of some
kind of addlib command. couchpy itself could ignore the # version stubs, since
those would just be there to provide an easy upgrade path for libraries. Or it
could compare the version shown there to the version of the module it imports,
and update the design doc if a new version is found on the module path.
Dependencies starting with "_attachments" could be handled specially by couch.
Original comment by extempor...@gmail.com
on 26 Mar 2011 at 11:26
By the way, you're right that use of random is side-effect-free. Just keep in
mind that couch's Etag/caching semantics will make it so that, if an HTTP
client does proper caching, it'll do a conditional GET request for the
show/list the next time you ask for it, and unless one of the documents the
list/show depends on has changed, couch will tell that client that the resource
has not been updated.
Therefore, your random lists will only look random once per depended-upon
document update. This is on a client-by-client basis, though. If you have your
own caching proxy in the middle, or something like couchbase starts having its
own response cache, then everybody will see the same random results on each
request, until the next time one of the pertinent documents is updated. This is
another "good thing" that couch provides, because even though it might hurt you
5% of the time, it really helps with scalability and responsiveness 95% of the
time.
Original comment by extempor...@gmail.com
on 26 Mar 2011 at 11:33
> Yeah, I'll have to look into that 'require' thing. On first glance, it
> looks like couchjs is doing a request to the design doc for the dependencies.
It doesn't but it have access to it via closure. It just have passed to compile
function as second argument.
Your example could pass and work as "expected", but it just a case. There are
others that wouldn't worked as "expected". There is needed just stable entranse
point.
May be some kind of decorator would be solution as:
>>> import datetime
>>> def helper(item):
>>> ...
>>> @main
>>> def mapfun(doc):
>>> ...
But would it be good, explicitly and clean? Looks like the same as predefined
function with special name. I suppose there is no so much need in complex code
block. One node - one function. Libs will take others with exported statements
as they have been designed to do + eggs as libs to store more complex packages.
> Couch's _changes API would need to be monitored for design doc changes by
> couchpy too, so that couchpy can know when it needs to reload modules.
> If this were achievable, you could bundle your modules in the design doc
> itself (regular zip files and eggs could be supported).
It doesn't needs as if design document have been changed there would be passed
command to refresh it within local cache.
Also note, that attachments is separate entity that just binded to document,
but doesn't pass with it. So to call attachment from show/list you have to make
pure http request - madness!(:
> For example, couch could *hypothetically* do: ...
Too complex solution: instead of just create function you have to create it +
set up all required dependences to make to work correct. Same thing does
require function currently - just invoke it and extract needed exported
statement.
> By the way, you're right that use of random is side-effect-free. Just keep
> in mind that couch's Etag/caching semantics will make it so that, if an HTTP
> client does proper caching, it'll do a conditional GET request for the
> show/list the next time you ask for it, and unless one of the documents the
> list/show depends on has changed, couch will tell that client that the
> resource has not been updated.
In show/list function you could set your own headers and disable caching via
Expires header. It has higher priority than Etag one. Actually, Etag only
__may__ be used for cache proposes.
Original comment by kxepal
on 26 Mar 2011 at 1:08
Good idea! As you indicated, if you have a single callable, then it'll work as
expected. If you have more than one callable, you must designate it with @main.
Well the main point is, that your idea provides a mechanism for the programmer
to be as expressive an succinct as they need.
...
def helper(item):
...
@main
def mapfun(doc):
...
is the equivalent of:
def mapfun():
...
def helper(item):
...
def mapfun(doc):
...
return mapfun
mapfun = mapfun()
Only difference is that the decorator is a *lot* easier.
The point about my dependency solution is that it lets couch handle recursive
dependencies with respect to index rebuilding and Etag handling, so that couch
can make sure that all data is consistent. I agree, it's complex, and there's
bound to be a better way (I don't care for my solution either -- it's an
initial suggestion). I just know that if we have to manage recursive (or even
non-recursive) dependencies ourselves, then it won't work. Sooner or later,
we'd end up with a badly inconsistent database, with bugs that are really hard
to notice.
Dependencies are necessary because in the typical couch application (at least
all of the ones I've done), there is a lot of duplicate code, and that
duplicate code makes the application much much harder to maintain.
Do you really want to override Etag handling as done by couch. Put it this way,
Etag is the absolute best caching mechanism available to you, but it's also
*very* complex to get it right. Enterprise-grade server software often fails to
handle it usefully (Apache uses inode numbers, which does not allow you to
cluster while still keeping out-of-the-box caching), and many high-end websites
with big budgets never manage to implement it, instead using expires headers,
or spending money on a secondary server to deal with application inefficiencies.
Couch manages Etags perfectly, so even though it's easy to add nodes to scale
couch, Etags work in the opposite direction, making it so you have much less of
a need to scale. If you have something that's completely dynamic, and there's
nothing in couch's architecture that tells you that you must use idempotent
show/list functions, then by all means, send a no-cache header. But if you have
something that is barely dynamic (like you include a random hash with the
output, just for the heck of it, or you want to add a string saying 'response
generated in 0.0013 seconds'), then you probably want to rethink what you're
trying to do, since you're sacrificing a lot to gain so little.
Original comment by extempor...@gmail.com
on 26 Mar 2011 at 2:08
> Only difference is that the decorator is a *lot* easier.
Easier? May be. Implicitly? For sure. You have always keep in mind this @main
decorator. However, I see we've come to current, original state - single
function which creates inner context (;
> Do you really want to override Etag handling as done by couch.
I don't mean to override it, but I've answer to you how to workaround cache
case based on Etag.
About Etag:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.19
a little below you may found Expires header.
Original comment by kxepal
on 26 Mar 2011 at 2:27
At first, sorry for those "wall of text" that subscribers had received from
us - probably we have to create separate topic on groups, but now it's too
late. And special sorry to those one, who unstarred this issue - he wouldn't
receive notification about new version of view server that I would like to
attach for testing.
Short changelog:
! remove dependency from versioning decorator
! fix Mime class and show functions with provides/response_with methods - they
was just totaly broken.
! fix Python exception encoding for CouchDB versions < 0.11.0
! correct filters for versions >= 0.11.1. There is no more userctx argument,
beware!
+ add missed ddoc cache (thanks for this discussion)
+ add support for add_lib command. CouchDB version >= 1.1.x required
+ add support for views command. Currently available for trunk version
+ add support for secobj for validate_doc_update commands. Requires CouchDB >=
0.11.1. However, this argument leaved as optional due to it doesn't mentioned
in most examples.
+ add require function with same behavior as same javascript function has
+ add docstrings for most valuable methods with descriptions and examples
~ add support for 0.8.0 version - that was too easy (:
~ allow imports in design functions (see notes below)
~ _log function has been replaced by logging handler
~ correct error message for design function wrong definition
~ code cleanup, reorganisation, formatting fixes
~ more tests added and passed (47 total, 5 failed for 0.9.0 due to I couldn't
reproduce valid behavior - have someone windows binaries of 0.9.x?)
Something about imports:
http://mail.python.org/pipermail/python-list/2007-September/507450.html
I really hadn't knew about this behavior(: So any imports at top level are
useless if only they are not be explicitly passed to target function as
arguments or through decorator. However, I've allow usage of them due to
perfomance reasons.
More detaled history of changes avaiable in viewserver branch:
https://code.google.com/r/kxepal-couchdb-python-featured/source/list?r=viewserve
r
Next questions that I have:
1. Should I split view.py into view package(propbably better name it viewserver
package) due to a code growing and missing support of sphinx autodocumentation?
2. Should I add preimported modules? I've stoped at next ones: base64,
calendar, datetime, math, random, re, time - they are quite common, useful and
avaiable in all supported versions.
3. Should I add eggs support via --egg-cache parameter where storage folder
would be specified? Eggs could be stored as base64 encoded strings, not as
attachments due to they are not avaiable from view server.
Original comment by kxepal
on 2 Apr 2011 at 6:00
Attachments:
Ok, I'll answer on those questions by myself(:
> Should I split view.py into view package(propbably better name it viewserver
package) due to a code growing and missing support of sphinx autodocumentation?
Yes, I should. Because operate with 2K of very nested codebase with massive
cross functions dependencies is not easy and missing sphinx autodoc feature
makes to be sad.
> Should I add preimported modules? I've stoped at next ones: base64, calendar,
datetime, math, random, re, time - they are quite common, useful and available
in all supported versions.
No, I shouldn't. Because I couldn't decide the developer needs for current
project, even if those modules are all fits to most tasks. Instead of that,
I've create something like QueryServer constructor, which could be used to
create your own QueryServer with your own behavior without couchdb-python code
patching. Petty nice solution, right?(; See `construct_server` function in
`couchdb.server.__init__.py` for how the default query server is defined.
> Should I add eggs support via --egg-cache parameter where storage folder
would be specified? Eggs could be stored as base64 encoded strings, not as
attachments due to they are not available from view server.
Yes, I should. Because this feature provides too much to leave it ignored.
However, it's optional and must be enabled explicitly for security and
compatibility reasons. To store eggs within design documents you should encode
egg as base64 string. See documentation for examples.
So, query server was totally refactored from single module to full package and
here is new version changes:
+ add support eggs as modules.
+ add option to control GET request to update functions.
+ add query server constructor: define your own context, error handlers,
commands(if you've own CouchDB fork or living with very nightly builds) and
more.
+ add query server documentation article.
+ add own logging channel for each part of query server.
~ update "Writing views in Python" documentation article.
~ fix doc strings to make them more sphinx friendly.
~ fix for require circular references (COUCHDB-1075)
- remove debug decorator, because now you may implement it by your own if you'd
like.
Tested on Python 2.4-2.7 and PyPy 1.5. All changes are still available at
http://code.google.com/r/kxepal-couchdb-python-featured/source/list?r=viewserver
And that's all I think(: Could someone review documentation articles due to my
poor english knowledge and code to decide is there something needed to change?
Any ideas? Criticism? Thanks(:
Original comment by kxepal
on 9 May 2011 at 4:25
Attachments:
tested on android 2.3.4 Google Nexus One using Py4A application. To share my
happiness do next things:
1. copy couchdb package folder to
/sdcard/com.googlecode.pythonforandroid/extras/python (query server imports are
absolute and uses couchdb package as root)
2. create file on sdcard, for example /sdcard/couchpy, and place next code into
it:
PYTHONPATH=/data/data/com.googlecode.pythonforandroid/files/python/lib/python2.6
/lib-dynload
PYTHONPATH=${PYTHONPATH}:/mnt/sdcard/com.googlecode.pythonforandroid/extras/pyth
on
export PYTHONPATH
export PYTHONHOME=/data/data/com.googlecode.pythonforandroid/files/python
export
LD_LIBRARY_PATH=/data/data/com.googlecode.pythonforandroid/files/python/lib
/data/data/com.googlecode.pythonforandroid/files/python/bin/python
/mnt/sdcard/com.googlecode.pythonforandroid/extras/python/couchdb/view.py
--couchdb-version=1.0.0
3. add next line to query_servers section in CouchDB configuration:
python = sh -e /sdcard/couchpy
4. ...
5. now you could use pythonic design documents on android(:
Original comment by kxepal
on 22 May 2011 at 4:38
It's good thing to review your own code after some time passed. This update
includes a lot of fixes and even some new features:
global:
- removed global state and cross module references (WOO-HOO!)
- rewritten QueryServer api
- added SimpleQueryServer as high level abstraction on top of QS internals
- added MockQueryServer to help write unittests
- fix docstring and typos.
- query server logs are more useful now in debug mode
- update documentation with android paragraph and how to customize query server
- place TODO references to actual CouchDB issues: COUCHDB-729, COUCHDB-282,
COUCHDB-1261, COUCHDB-898. I could fix them locally, but this will make more
differences between original JS server and Python one.
- add more over 170 test cases
compiler:
- fix crush for compilation of source code with windows newlines
- fix double crush if function compilation failed
- fix crush for malformed base64 encoded egg
- fix crush on egg cache usage
- code refactoring
stream:
- abstraction from JSON module exception type on decode/encode operations
render:
- fix COUCHDB-1272
- code refactoring
validate:
- prevent query server crush by validate_doc_update on Python exceptions
views:
- reduce_output_overflow error now will be raised properly
- small refactoring
- document seal now works better with copy.deepcopy()
design functions:
- send(), start(), provides(), register_type() available only for show and list
functions
- get_row() available only for list functions
- log() function is not proxy of logging.info anymore
All test passed for:
- Python 2.4 to 2.7
- PyPy 1.5 and 1.6
- Android 2.3.4 with Python for Android version 5 against CouchDB-1.0,
andorid-0.1 and MobileFuton 1.7
Please, could someone review docstrings and sphinx docs? I'm sure documentation
text is far from good state /:
Original comment by kxepal
on 13 Sep 2011 at 10:44
Attachments:
I took a look at this, but I'm having some trouble getting the tests running.
In particular, this bit doesn't seem to work, independent of the view server
used:
djc@enrai couchdb-python $ python
Python 2.7.2 (default, Oct 24 2011, 10:16:20)
[GCC 4.5.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> pipe = subprocess.Popen(['/usr/bin/python2.7', 'couchdb/view.py'],
shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
>>> pipe.stdin.write('["reset"]\n')
>>> pipe.stdout.readline()
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyboardInterrupt
>>>
Meanwhile, this works just fine from the command-line:
djc@enrai couchdb-python $ python couchdb/view.py
["reset"]
true
I looked at the queryserver.zip from comment 25 and the files from comment 6.
It seems to me that the former is too large and complex to take into
couchdb-python at this time. The stuff from comment 6 is much more simple, but
I couldn't run the test suite due to the above issue.
Finally, the code in comment 6 does a bunch of stuff to stay compatible with
all of 0.9, 0.10 and 0.11+. I would propose that any new view server code we
take for our next release be limited to supporting 0.11+-compatible; that's
already quite old at this point.
Original comment by djc.ochtman
on 12 Feb 2012 at 11:49
Hi, Dirkjan!
Thanks for first review(: Actually, I never run it as subprocess, but if you
take a look at couchdb/tests/testutil.py::QueryServer so it could be run as
subprocess, just remove shell=True from Popen.
There is a huge difference between comment 6 and comment 25. It's not in code
size, it's in bugs, code complexity, documentation, tests, features, logging
and how easily you could extend it without getting things broken. Support of
old CouchDB releases is not a little part of it, just a few functions that
easily could be removed. For example, I've easily added multiprocessing support
for qs#25 for map/reduce functions just by decorating server/views.py functions
without touching source code.
Also, you have to change a look from comment 6 to comment 22. That was the last
version of all-in-one-file queryserver, but it still buggy by design.
Main goal of qs#25 and all code splitting was to simplify future support, allow
to extend it easy and help with couchapps unittesting because now you could run
it not as subprocess. And it had been reached.
You may read changelogs in this thread and in my clone at viewserver branch,
they are quite full.
I admit, that it's a little big patch for about 250KB of code(removing
docstrings could reduce it by half I sure), but I'd like to take support of it,
because I use it for everyday tasks, I knew each line of it, I'd like to help
couchdb-python project and I do not want to create
yet-another-python-queryserver-project. Peoples knows about couchdb-python,
knows about his viewserver and expecting that it's fine. Why not to satisfy
them?
Original comment by kxepal
on 12 Feb 2012 at 12:40
Updated Python query server in attachments. After almost year usage in
production there was fixed some small problems:
- Eventually crush on chunk encoding for _list functions.
- View lib cleanup on reset command
- Handle single named MIME type params e.g. application/pdf;base64
- Reduced useless logging output to improve they readability.
- Fix COUCHDB-1330.
- Fix crush on malformed MIME type.
- Couchapp modules no more needed to be wrapped into some scope: just write
regular
Python code for them. For example:
{{{
import datetime
def foo(datetime=datetime):
return datetime.datetime.utcnow().replace(microsecond=0).isoformat('T')
exports['foo'] = foo
}}}
Now you may remove any proxy hacks to have simply and expected code behavior:
{{{
import datetime
def foo():
return datetime.datetime.utcnow().replace(microsecond=0).isoformat('T')
exports['foo'] = foo
}}}
This change doesn't affects on other ddoc functions: show/lists/views etc.
Original comment by kxepal
on 3 Aug 2012 at 4:54
Attachments:
Original comment by djc.ochtman
on 21 Sep 2012 at 8:33
This issue has been migrated to GitHub. Please continue discussion here:
https://github.com/djc/couchdb-python/issues/146
Original comment by djc.ochtman
on 15 Jul 2014 at 7:19
Original issue reported on code.google.com by
kxepal
on 16 Aug 2010 at 2:19