Full document hydration with recursive reference priming

maschmann commented 9 years ago

I have a scenario where I get a document with many referenced and embedded documents from the db. When using a repo and the querybuilder there's the option to hydrate(false) to get the full dataset. If I want it hydrated to an object, I'll receive a lot of proxies that are resolved "on-the-fly" when calling the according methods/ iterating the ArrayCollections. How about doing a full hydration of all objects on hydrate('full') or something like that?

What I want is to be able to cache the resulting object after hydration to buffer some of the load for my application. And still be able to use objects to work with, instead of arrays if I skip hydration in favor of cacheable data. Or is there any other way to get a fully hydrated result?

malarzm commented 9 years ago

@maschmann I think that you are looking for priming?

maschmann commented 9 years ago

@malarzm not really, I fear.

When I do

return $this->createQueryBuilder()
            ->field('id')->equals($id)
            ->field('articles')->prime(true) // ReferenceMany
            ->field('mediaCollection')->prime(true) // ReferenceMany
            ->field('media')->prime(true) // ReferenceMany
            ->field('bannerCollection')->prime(true) //ReferenceOne
            ->field('sidebar')->prime(true) //ReferenceOne
            ->getQuery()
            ->getSingleResult();

I won't get a cacheable object altogether - also priming only works on non-embedded docs.

Maybe it's clearer when I try to explain that way: If I have the resulting document from the above query and serialize it to json using JMSSerializer and then deserialize again, I have all data referenced/embedded in one object which I could then add to my (result) cache.

malarzm commented 9 years ago

I won't get a cacheable object altogether

Why is that? I haven't used JMSSerializer myself but I suppose it can serialize any object?

priming only works on non-embedded docs.

This is true, however there's PR #970 aiming to fix that

maschmann commented 9 years ago

I'm still not sure if we're on the same page - or I don't understand priming.

My understanding of priming is: It's a kind of "pre-fetch" for referenced data, recursively. It is not hydrating the data to objects but holding it in PersistentCollections.

What I want to do is: If I get a document and it has either embedded or referenced documents, they don't need to be collected one call. But they need to be recursively iterated and hydrated to objects. A bit like what the ODM does when I set hydrate(false) and get a deeply nested array with all data from all embeds/references. I'd like that - but just in object-form, so it's more easy to cache.

I can achieve exactly that by serializing the result to a json (recursive iteration etc. etc.) and then deserializing again. But that's awfully slow.

malarzm commented 9 years ago

Ok, we're on same page now, sorry for that :) I'll discuss this with @jmikola later today

maschmann commented 9 years ago

@malarzm nevermind :-)

Maybe something like a "hydration level" could be introduced to tell the hydrator to either not hydrate (false), hydrate the base document (true) or the complete tree (full).

jmikola commented 9 years ago

@Ocramius: Is there a comparable feature in ORM? This seems like exhaustingly priming references through any and all relationships originating in the top-level document.

@maschmann: I read your conversation with @malarzm above, but I do believe this is still priming. #970 has to do with being unable to prime references contained within embedded documents, but priming itself is what handles the recursive hydration you're requesting. This sounds more like a prime-everything strategy to save the trouble of requesting priming in each and every mapped reference in a document hierarchy.

maschmann commented 9 years ago

@jmikola in that context you're right. I'd be happy to have a completely (recursively) hydrated object, since the hydration overhead could then be buffered via caching. I'm always a bit worried if I can't cache a "database result", even though it's still fast enough to use it uncached, the load might force it. And it would be more manageable than priming every reference/embed, right. Also if hydration(false), the resulting array contains all (ebedded/refrenced) data. But re-hydrating it into objects after pulling that from cache renders the benefit of caching kinda useless.

smolinari commented 9 years ago

I am uncertain. Why have two systems (cache/ MongoDB), when you don't really need them? If you aren't getting the performance from MongoDB, then shard out MongoDB to more machines. It would make your code and your systems simpler.

Scott

maschmann commented 9 years ago

If you have a larger tree of objects reference -> embed -> embed -> reference e.g. (not the best design, I know), the hydration cost is rather high. That's why the lazy loading is generally a good idea. The performance impact though, is not due to mongodb being slow and therefore can not be compensated by hardware upgrade/sharding. It's on the php side where the retrieved data is mapped to objects. Generally the possibility of caching should be considered when using a data storage and not needing absolutely fresh data.

smolinari commented 9 years ago

I would say the level idea is the best then and only go to a certain depth like 3 levels. Any more than that would be quite rare, if at all needed, especially with Mongo. If it is needed, then there is something wrong with the model in general. Possibly some data needs to be denormalized and "copied" into the mongo collection, where it is needed and isn't updated too much.

Scott

jmikola commented 9 years ago

@maschmann:

Also if hydration(false), the resulting array contains all (embedded/referenced) data.

Is that true? My understanding is that disabling hydration will leave references as-is, be it _id values or DBRef objects.

I'm going to leave this issue as a feature request for the time being. If you need the functionality immediately I think you can whip up a service class that uses ReferencePrimer to prime one or more documents at once and operates recursively. That should have comparable performance to whatever ODM would do internally.

maschmann commented 9 years ago

:+1: Will give it a try tomorrow: Thanks for all the good input and discussion!

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

doctrine / mongodb-odm

Full document hydration with recursive reference priming #1040