Using custom hydrator - Githubissues

NikitaKharkov commented 7 years ago

I want to know how can create custom hydrator such the ORM doctrine has? The ORM has nice method: getResult($hydratorName) which was registered in config.yml. So can I register in the same in ODM? I saw config string auto_generate_hydrator_classes and hydrator_dir. Yes, I can setup them with my custom needs, but I could not use different hydrators because of naming strategy of file. So what do you advise me? My question in stackoverflow: link

malarzm commented 7 years ago

I'm afraid it's not possible right now as HydratorFactory is not injected. With some reflection magic you should be able to replace the factory with your own implementation, but that breaks warranty :)

EDIT: this obviously doesn't give you any way to use custom hydrator on demand, it would barely allow you to use your own hydrators

NikitaKharkov commented 7 years ago

And what about nearest updates about this feature? I think it's very important. For example, I have a document which contains 5 ArrayCollection. Now, when I have in test fixtures only 35 documents and with max 5 relation in each one I don't worry about it. But in production my customer tell that I will about 50 or maybe more. And it is cascade dependencies, e.g.

mainQuota
   partialQuota -> mainQuota
     bookingQuota -> parentQuota (main or partial)
        bookedQuota -> parentQuota (main, partial or booking)

And all of them contains dependencies, e.g. bookingQuota have partial which have main, itself, and all booking and booked, main have all partial, booking, and booked and so one.

By now I reduce this situation by jms accessor which return me only theirs ids, but then I have much more dependecies it will highload system.

So what you advise me? :)

malarzm commented 7 years ago

So what you advise me? :)

Do not solve performance problems before they happen ;) When they do happen, profile and see where's the bottleneck and fix it. Repeat until performance problem is gone or a complete overhaul needed.

Now for a bit longer reply. I obviously don't know your domain, but employing custom hydrators can be a pain in the neck (I was lately pondering over this idea with my team), as then you have yet another thing to maintain and update whenever your structure changes. From your post it seems like you're not caring for the references contained in the document - do not fetch them or fetch only what you need (ids perhaps as you've mentioned serializing them). Not only the ODM will have less to do (as each default hydrator checks whether a value for field actually came from the db) but also less data will be transferred via network. Next on collections: both ReferenceMany and EmbedMany are loaded/hydrated lazily: that means no hydration is happening until you do something with that very collection, ODM merely holds data from db and waits for the right moment.

Next as you've mentioned JMS and accessors - do not go down that path. Coming from relatively complex domain with multiple views (i.e. serializer groups) and (too) many dependencies we lost control over, it's the shortest path to performance hell. The best thing we came up with were Value Objects for the endpoints - they were taking a domain entity and transforming it to a simple objects that held only data that needed to be serialized, and only that deep as really needed by the view. Even more, that view objects stored all data in public fields and if not for HATEOAS, we could ditch JMSSerializer altogether in favour of old simple json_encode. This change alone gave us really big performance boost.

I think that would be all advices I have for you, sorry for the long post, hope it helps ;)

NikitaKharkov commented 7 years ago

Thanks for a bit longer reply! :) But I still don't understand how can I turn off lazy load. And you wrote that we lost control over, it's the shortest path to performance hell.. What do you mean? I think I have to say that information which I want reduce only for reading.

Next on collections: both ReferenceMany and EmbedMany are loaded/hydrated lazily: that means no hydration is happening until you do something with that very collection, ODM merely holds data from db and waits for the right moment. But the queries goes into DB? And if you have many dependencies for each time to load all of them is a big performance problem.

But in general I agree with you - we have to decide our problem as they processed :)

malarzm commented 7 years ago

But I still don't understand how can I turn off lazy load.

If we're speaking about reference there's a process called priming - it aims to avoid n+1 problem.

But the queries goes into DB? And if you have many dependencies for each time to load all of them is a big performance problem.

No they're not, that's why we call references lazy. When a document is hydrated and hydrator finds a ReferenceMany field, a not initialized PersistentCollection is created but no query is made. At this point the collection holds mapping information and only when you interact with collection (e.g. iterate) a query is fired to fetch. Similar thing is happening for ReferenceOne fields but then a Proxy object is created which also defers DB query until that data is needed.

Also if you know upfront that you won't need certain fields from the document you can just not fetch them from database using query builder's select.

And you wrote that we lost control over, it's the shortest path to performance hell.. What do you mean?

Basically we were adding more and more serializer groups to entities and it quickly turned out that using "particular-group" loaded not only references 1 level deep as we wanted, but way more (Item had reference to Contact but also Contact held inverse references to owned Items - accidentally they started to become serialized going deeper and deeper, I remember some of responses were over 10MB due to this). Thanks to introducing the view objects I've mentioned earlier we regained absolute control on what, when and how is serialized and quickly cut both response size and time.

NikitaKharkov commented 7 years ago

Many thanks! Now I guess I begin to understand about all ODM in general... :) I have two more questions: Can I choose fields with associated objects in by query builder? It could be all types: reference one/many, embedded one/many. I didn't find information about it...

And the second one: in my opinion, there is no good to have master version of any bundle. But the prime is only on master. There is no release with this addition. When you release it?

malarzm commented 7 years ago

Can I choose fields with associated objects in by query builder? It could be all types: reference one/many, embedded one/many. I didn't find information about it...

Now this depends on how the association is mapped. For embedded documents it should be possible using field.subfield notation (but I won't bet my head on this) while for references you'll need to use repositoryMethod and create the query yourself.

in my opinion, there is no good to have master version of any bundle. But the prime is only on master. There is no release with this addition. When you release it?

I'm not sure what the problem here is, can't tell exactly when priming was implemented but it was way before 1.0 :)

NikitaKharkov commented 7 years ago

I don't lie :) link to 1.1.5. I will be so happy to use prime. I work closely with odm since march and I haven't seen the prime in official docs... But any way tell me please what I should update to use prime function? :)

About repositoryMethod... But I can't use this method in many cases, yes? E.g. in one endpoint I need to reduce such list of fields, in else - another list.

You were right when you tell me do not resolve not existed problems. But I really want to know how make I can do it if the future. Big data is a trend for today, and I'm sure that next project will force me find solution.

Many thanks any way!

alcaeus commented 7 years ago

@NikitaKharkov reference priming without using repositoryMethod will be added in 1.2. Until then, you have to use repositoryMethod in combination with prime

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

alcaeus commented 5 years ago

Closing here as per previous replies.

hjardines commented 11 months ago

I know re-opening a 4-year-old post, but it makes sense.

Not to bore you to death, I have these 2 collections (simplified):

Documents

ID, NAME (string), CONTENT (string)

Events

ID, START_DATE (date), END_DATE (date), DOCUMENTS (ReferenceMany)

Getting 100 events is also getting all the associated documents, and that is OK.

However, I would like for it to ignore the "CONTENT" field in the query request while hydrating.

I can't change the collection structure at this time.

I believe a custom hydrator could do the trick.

Any ideas?

malarzm commented 11 months ago

DOCUMENTS (ReferenceMany) is an inverse side I guess therefore it's lazy loaded (i.e. no query is being made until you try to use the collection or have priming in place). What's wrong in having it?

hjardines commented 11 months ago

That instead of making one request to download all Documents at once, is doing one query for each document.

I know Prime might be the solution, but haven't been able to figure it out.

doctrine / mongodb-odm

Using custom hydrator #1617