djc / couchdb-python

Python library for working with CouchDB
Other
202 stars 86 forks source link

Reduced documents loose their id with schema views #109

Open djc opened 10 years ago

djc commented 10 years ago

From dbrat...@gmail.com on December 20, 2009 09:41:25

What steps will reproduce the problem? 1. Use a reduce function that reduces to a single document

We have a schema view with the following map/reduce functions (python view)

def map(doc):
    if doc['type'] == "Result":
        yield (doc['user_id'], doc['event']), doc

def reduce(keys, values, rereduce):
    return min(values, key=lambda doc: doc['value'])

Thus we want to find the single document with the minimum value

2.The id of the reduced document gets lost even if it's included in the result

{
    "key": ["3df2a23a4c0549265b6e808f3b61b30e",  "myevent"],
    "value": {
       ....
        "_id": "dbfdaa255af4050533fbed33ad43bb1f"
    }

The row does not have an id because it's a reduce result, but the document value still contains the id as _id

  1. Schema View sets id to None

Since we use a Schema View to wrap our resulting document the wrapper function will be used (line 273 in schema.py)

        def wrapper(row):
            if row.doc is not None:
                return cls.wrap(row.doc)
            data = row.value
            data['_id'] = row.id           <--- overwrites the existing _id with None
            return cls.wrap(data)

The document value contains the _id of the document, but the wrapper function overwrites it with None since it was a reduce result. What is the expected output? What do you see instead? The wrapper function should not overwrite the _id if it exists in the document (row.value). Thus something like this should do the trick:

            data['_id'] = row.id or data.get('_id')

Thus we trust the row.id if it exists, but use the data _id if it exists and if there is no row.id What version of the product are you using? On what operating system? 0.6.1 Please provide any additional information below. With the current implementation the id gets lost, and to get around it I have done the very bad hack of modiying the doc within the map function doc['id'] = doc['_id'], so I can still find the id in the result. I think it's a mistake to think that reduce results does not have an id. In most cases they don't, but sometimes you do not want to reduce to a single value, but to a single document, and in that case it's nice to know the id of the document.

Attachment: couchdb-lost-id.patch

Original issue: http://code.google.com/p/couchdb-python/issues/detail?id=109

djc commented 10 years ago

From djc.ochtman on December 24, 2009 01:56:05

Can you also provide a test case?

djc commented 10 years ago

From matt.goo...@gmail.com on December 24, 2009 03:55:43

As the original poster stated, reduce result rows don't have a doc id. So, I don't think wrapper() should ever overwrite anything in the value, i.e. it shouldn't update with _id=None either.

Does this version of the patch have the same effect:

            if row.id:
                data['_id'] = row.id

This might be better as a different ticket, but I'm not sure wrapper() should try to construct a schema document from the value of a reduce or view. Shouldn't a schema view ensure incluce_docs=True is set and then fail for reduce views (because row.doc is None)?

djc commented 10 years ago

From dbrat...@gmail.com on December 25, 2009 00:26:33

I find it hard to write a test case since I don't yet understand why the wrapper needs to set/overwrite the row value. I see 3 cases. 1) For include_docs=True, row.doc is used instead of doc.value, so there's no problems 2) A full doc is included as the value (my case), and there is no need for overwriting the id in the value 3) A partial doc without the id is included as the value. When is this case 3 actually needed?

djc commented 10 years ago

From kxepal on June 27, 2010 06:56:22

I suppose, issue still actual?

For example we have this view map function:

function(doc){ emit(doc['_id'],1); }

On execution, there will be an error at mapping.py@414 data = row['value'] data['_id'] = row['id'] # TypeError: 'int' object does not support item assignment

because, row['value'] is not doc object and never will be it for all cases.

Reducing view makes another error at the same line: KeyError: 'id' in mapping.py@414 because there is no id key in reduced view.

I suppose, there must not be mapping reduced view result to Document schema by default. The only way if include_doc sets as True - this options makes us sure, that we always will get full document, not only abstract parts of it.