Tokutek / mongo

TokuMX is a high-performance, concurrent, compressing, drop-in replacement engine for MongoDB | Issue tracker: https://tokutek.atlassian.net/browse/MX/ |
http://www.tokutek.com/products/tokumx-for-mongodb/
704 stars 97 forks source link

Updating an existing document causes size to double #1190

Closed dgelvin closed 10 years ago

dgelvin commented 10 years ago

I am running a cluster of two sharded replica sets with TokuMX 1.5 on Ubuntu 14.04 x86_64. I am sharding on the hashed _id of the collections.

I am using the pymongo 2.7.2 driver to access the mongos router.

When I attempt to overwrite a 9mb document TokuMX appears to be doubling the document size, resulting in a 18mb document which fails to save:

doc = db.collection.find_one({'_id': '53e0...'})
db.collection.save(doc)

pymongo.errors.OperationFailure: BSONObj size: 18798961 (0x71D91E01) is invalid. Size must be between 0 and 16793600(16MB) First element: op: "u"

doc = db.collection.find_one({'_id': '53e0...'})
db.collection.update({'_id': _id}, doc)

pymongo.errors.OperationFailure: BSONObj size: 18798961 (0x71D91E01) is invalid. Size must be between 0 and 16793600(16MB) First element: op: "u"

However, if I simply retrieve the same 9mb document, then remove it, I am able to successfully save it:

doc = db.collection.find_one({'_id': '53e0...'})
db.collection.remove({'_id': doc['_id']})
db.collection.save(doc)

I am not actually modifying the document- simply reading it and attempting to replace it without modification. Obviously for my real use case I would be making a modification, but this issue is reproducible without making any changes to the document.

I created a JIRA issue regarding this problem but they suggested this may be specific to TokuMX.

zkasheff commented 10 years ago

This is because our oplog entry is twice the size of the document. Right now, our oplog entries for updates store the entire pre-image of the document, which is doubling the size of the oplog entry. That is a downside that we are working on improving as we speak. In the meantime, do you know how the document is changing, and if so, can you use modifiers to change it? Regardless, that should improve performance

On Tue, Aug 5, 2014 at 1:30 PM, dgelvin notifications@github.com wrote:

I am running a cluster of two sharded replica sets with TokuMX 1.5 on Ubuntu 14.04 x86_64. I am sharding on the hashed _id of the collections.

I am using the pymongo 2.7.2 driver to access the mongos router.

When I attempt to overwrite a 9mb document TokuMX appears to be doubling the document size, resulting in a 18mb document which fails to save:

doc = db.collection.find_one({'_id': '53e0...'}) db.collection.save(doc)

pymongo.errors.OperationFailure: BSONObj size: 18798961 (0x71D91E01) is invalid. Size must be between 0 and 16793600(16MB) First element: op: "u"

doc = db.collection.find_one({'_id': '53e0...'}) db.collection.update({'_id': _id}, doc)

pymongo.errors.OperationFailure: BSONObj size: 18798961 (0x71D91E01) is invalid. Size must be between 0 and 16793600(16MB) First element: op: "u"

However, if I simply retrieve the same 9mb document, then remove it, I am able to successfully save it:

doc = db.collection.find_one({'_id': '53e0...'}) db.collection.remove({'_id': doc['_id']}) db.collection.save(doc)

I am not actually modifying the document- simply reading it and attempting to replace it without modification. Obviously for my real use case I would be making a modification, but this issue is reproducible without making any changes to the document.

I created a JIRA issue https://jira.mongodb.org/browse/PYTHON-745 regarding this problem but they suggested this may be specific to TokuMX.

— Reply to this email directly or view it on GitHub https://github.com/Tokutek/mongo/issues/1190.

dgelvin commented 10 years ago

Thanks for the response- glad to understand what is happening. We will update our code to only save what is necessary.