BeanieODM / beanie

Asynchronous Python ODM for MongoDB
http://beanie-odm.dev/
Apache License 2.0
2.05k stars 217 forks source link

[BUG] Memory consumption during iterative migration is very high #1055

Open pschoen-itsc opened 2 weeks ago

pschoen-itsc commented 2 weeks ago

Describe the bug We recently try to run an iterative migration, but it always was killed by the OS because it kept using to much memory. It was a collection with hundred thousands of rather large documents and after a few minutes the migration scripts used up over 10 GiBs of memory.

To Reproduce Create a big collection (best multiple GiBs on disk) and run an iterative migration on them.

Expected behavior Memory consumption does not grow during the migration

Additional context Using a free fall migration works fine. From the implementation of the iterative migration it is clear where to "problem" is, because all operations are collected and only executed at the end, so every document has to be held in memory. I'm not sure about the batching logic which is implemented there, but wouldn't it be a solution to directly execute each batch instead of collecting them?

pschoen-itsc commented 2 weeks ago

I'm happy to create a PR with my proposed solution, but wanted to know first what are the reasons for the current implementation.