OpenPhilology / Iris

The OCR pipeline to succeed Rigaudon
2 stars 0 forks source link

Tons of memory used #7

Open sonofmun opened 9 years ago

sonofmun commented 9 years ago

I am running Iris on 373 .png files simultaneously and it is using ~64GB of memory. It would be nice if this memory usage could be reduced a bit. This is with having set CELERYD_CONCURRENCY to 16.

sonofmun commented 9 years ago

This has become a critical problem. I doesn't look like the workers ever decrease the memory they are using. So as they take more pages on, the memory usage per worker becomes greater. With a 370 page batch, this was manageable. But it appears to choke on a 600+ page batch job simply because, before it finishes, the workers have soaked up all available physical and swap memory so that Celery starts to give "cannot allocate memory" errors. This, I think, has become the most important problem to solve.

fbaumgardt commented 9 years ago

Can we use the quick fix to partition the jobs into 100 pgs units? I think we probably need to move quick right now and then return to fix this.

Am 21.01.2015 um 11:01 schrieb sonofmun notifications@github.com:

This has become a critical problem. I doesn't look like the workers ever decrease the memory they are using. So as they take more pages on, the memory usage per worker becomes greater. With a 370 page batch, this was manageable. But it appears to choke on a 600+ page batch job simply because, before it finishes, the workers have soaked up all available physical and swap memory so that Celery starts to give "cannot allocate memory" errors. This, I think, has become the most important problem to solve.

— Reply to this email directly or view it on GitHub https://github.com/OpenPhilology/Iris/issues/7#issuecomment-70811765.

sonofmun commented 9 years ago

Yes, that would be a good fix. I even think we could do 300 page units without a problem. Bug 100 page would be safer, I guess.

On 22.01.2015 14:55, fbaumgardt wrote:

Can we use the quick fix to partition the jobs into 100 pgs units? I think we probably need to move quick right now and then return to fix this.

Am 21.01.2015 um 11:01 schrieb sonofmun notifications@github.com:

This has become a critical problem. I doesn't look like the workers ever decrease the memory they are using. So as they take more pages on, the memory usage per worker becomes greater. With a 370 page batch, this was manageable. But it appears to choke on a 600+ page batch job simply because, before it finishes, the workers have soaked up all available physical and swap memory so that Celery starts to give "cannot allocate memory" errors. This, I think, has become the most important problem to solve.

— Reply to this email directly or view it on GitHub https://github.com/OpenPhilology/Iris/issues/7#issuecomment-70811765.

— Reply to this email directly or view it on GitHub https://github.com/OpenPhilology/Iris/issues/7#issuecomment-71023025.

Matthew Munson Researcher Alexander von Humboldt Chair of Digital Humanities Universität Leipzig, Institut für Informatik Augustusplatz 10, 04109 Leipzig Deutschland

fbaumgardt commented 9 years ago

Yes, it’d be nice if we don’t suck up all the resources with this one task. :)

Am 22.01.2015 um 15:01 schrieb sonofmun notifications@github.com:

Yes, that would be a good fix. I even think we could do 300 page units without a problem. Bug 100 page would be safer, I guess.

On 22.01.2015 14:55, fbaumgardt wrote:

Can we use the quick fix to partition the jobs into 100 pgs units? I think we probably need to move quick right now and then return to fix this.

Am 21.01.2015 um 11:01 schrieb sonofmun notifications@github.com:

This has become a critical problem. I doesn't look like the workers ever decrease the memory they are using. So as they take more pages on, the memory usage per worker becomes greater. With a 370 page batch, this was manageable. But it appears to choke on a 600+ page batch job simply because, before it finishes, the workers have soaked up all available physical and swap memory so that Celery starts to give "cannot allocate memory" errors. This, I think, has become the most important problem to solve.

— Reply to this email directly or view it on GitHub https://github.com/OpenPhilology/Iris/issues/7#issuecomment-70811765.

— Reply to this email directly or view it on GitHub https://github.com/OpenPhilology/Iris/issues/7#issuecomment-71023025.

Matthew Munson Researcher Alexander von Humboldt Chair of Digital Humanities Universität Leipzig, Institut für Informatik Augustusplatz 10, 04109 Leipzig Deutschland — Reply to this email directly or view it on GitHub https://github.com/OpenPhilology/Iris/issues/7#issuecomment-71023807.

mittagessen commented 9 years ago

Try adjusting the prefetch limit in the celery configuration. If I understand it correctly each task is distributed to multiple workers at the same time which is probably not what we want as we have long running tasks. Add these two lines and see if it fixes things:

CELERY_ACKS_LATE = True CELERYD_PREFETCH_MULTIPLIER = 1