coresmart / persistencejs

persistence.js is an asynchronous Javascript database mapper library. You can use it in the browser, as well on the server (and you can share data models between them).
http://persistencejs.org
1.73k stars 240 forks source link

persistence.flush bottlenecks on session.trackedObjects #102

Closed pixelcort closed 11 years ago

pixelcort commented 11 years ago

Let's say I have the following hypothetical code:

a = [];
for (var i=0;i<8000;i++){var deck = new App.Deck(); a.push(deck);}
persistence.asyncForEach(a, function(d,n){persistence.add(d);persistence.flush(n);},function(){console.log('done');});

The problem that I'm facing is that each successive flush takes longer than the one before it, and memory usage grows linearly.

I've tracked the issue down to the use of session.trackedObjects, which appears to not get cleared out anywhere. It is then used down in persistence.asyncParForEach(persistObjArray, ..., which in turn calls save() on all previously saved entries.

It appears flush clears out session.objectsToRemove; is it safe to also clear out session.trackedObjects here as well?

zefhemel commented 11 years ago

This is how persistence.js was designed from day one. Any object in memory that it tracks, it will track for as long as the app is running, indeed unless you clear trackedObjects, I think there's a call for that to (persistence.clear() I think). It borrows this behavior from hibernate. If you want to do the thing you're doing, you have to clear trackedObjects by hand, but then be aware that any changes made to the objects afterwards will no longe be persisted with a flush().

pixelcort commented 11 years ago

Ah I see now it's being used to track changed properties.

I wonder if instead of using trackedObjects, the entity instances could instead tell persistence if and when they are modified. This way trackedObjects could be cleared out, being repopulated by the entity instances when they are changed later. @zefhemel does this sound like it would work?

Currently, profiling in Chrome indicates the save method itself is the actual CPU-bound performance bottleneck. For example, adding 1375 entries, calling flush after each add, results in save being called 946000 times. This takes 11 seconds of CPU time alone. Safari is similar at around 20 seconds.

Memory is another issue; I'm trying to get mass insertion, deletion, and sync working on iPhone, but MobileSafari keeps getting killed due to consuming too much memory. It's possible it's not trackedObjects, but that's what I'm thinking at the moment.

zefhemel commented 11 years ago

That would work, but the trackedObjects hash is also used for ensuring uniqueness in memory, i.e. when I load object with id "1" twice in different ways (e.g. using a load and once using a query), there will only be one instance in memory. This is a useful property to have in your programs, otherwise things can get confusing.

pixelcort commented 11 years ago

Ah okay that makes sense.

If and when more browsers support WeakMap, it could be used to prevent duplicate instances while allowing the garbage collector to free up unused ones. Until then, I'll avoid dealing with large numbers of instances at the same time.

Thanks for the help! Closing.

pixelcort commented 11 years ago

Actually, it looks like WeakMap wouldn't help in this case. In any case, for large inserts and deletes one can just use raw SQL to manipulate the data.