KnottLab / codex

Pipeline processing CODEX output
MIT License
2 stars 1 forks source link

Ray-related memory leak #18

Closed nathanin closed 3 years ago

nathanin commented 3 years ago

During EDOF shared objects will be pinned in memory presumably because of some in-scope reference that I have been unable to track down, and therefore not cleared from the object store, effectively multiplying the space it takes to run EDOF, and causing overflows.

Very oddly, during testing for this issue, the increased object store usage is only observed for cycle > 0, channel > 0. Still unclear if it's always and only cycle>0, channel>0.

Anecdotally, I've seen large sections successfully complete the DAPI channels from all cycles, then begin to have problems later in processing.

Attempting to refactor using ray.put inside of edof_loop in order to eliminate the issue.

nathanin commented 3 years ago

seems like the main problem was coming up after storing background_1 in codex_object. the call to edof_loop would pass the whole codex_object into it unnecessarily. so, since the background image was not there in the first runs, this wasn't a problem. After that image is populated, it gets passed in and referenced by each one of the running edof_loop processes. As it's the actual array and not an object store reference, this was a problem.

checking to see if only passing the required metadata fields solves it.

another solution might be to store those objects in ray's shared memory and pull them with object ID's.

nathanin commented 3 years ago

0deea03

fixed ?