OpenConceptLab / ocl_issues

Issues for all OCL repos. NOTE: Install ZenHub Browser Extension and request access to the OCL Roadmap board to view all issues and to contribute
4 stars 1 forks source link

Improve memory management for import tasks #957

Open rkorytkowski opened 2 years ago

rkorytkowski commented 2 years ago

Currently the whole import file is stored in memory and passed around to workers. It uses memory on Redis, but also in all individual workers processing it i.e. main worker and concurrent workers.

Things to consider:

  1. Store the file in S3 and pass URI to Redis queue and workers.
  2. Make sure workers do not hold the whole file in memory rather read it through a stream.
  3. Workers should do a proper cleanup after done processing or delegating the processing.
  4. It seems that we would also benefit from putting final results in a file as well instead of keeping them in memory... (keeping partial results in concurrent workers in memory and returning them in memory is fine as long as we keep the batch not too big 1-10k)
rkorytkowski commented 2 years ago

@snyaggarwal a few low hanging fruits:

Can we discard a part_list from parts after it is sent to a queue https://github.com/OpenConceptLab/oclapi2/blob/48eafddeb7b4118757ee631cc1a22545e1746ab1/core/importers/models.py#L821 ?

Can we clear content and input_list once makeParts is done?
https://github.com/OpenConceptLab/oclapi2/blob/48eafddeb7b4118757ee631cc1a22545e1746ab1/core/importers/models.py#L722

snyaggarwal commented 2 years ago

@rkorytkowski Thanks for the suggestions. added both.

@snyaggarwal a few low hanging fruits:

Can we discard a part_list from parts after it is sent to a queue https://github.com/OpenConceptLab/oclapi2/blob/48eafddeb7b4118757ee631cc1a22545e1746ab1/core/importers/models.py#L821 ?

Can we clear content and input_list once makeParts is done? https://github.com/OpenConceptLab/oclapi2/blob/48eafddeb7b4118757ee631cc1a22545e1746ab1/core/importers/models.py#L722

rkorytkowski commented 2 years ago

Thanks @snyaggarwal, please see my comment on the first commit.

snyaggarwal commented 2 years ago

@rkorytkowski Fixed

paynejd commented 2 years ago

@snyaggarwal is this work complete? can we close this?

snyaggarwal commented 2 years ago

Some part of this was completed long time back. But overall some big chunks of work is still pending

snyaggarwal commented 6 months ago

I have added few fixes here: