OHDSI / Andromeda

AsynchroNous Disk-based Representation of MassivE DAta: An R package aimed at replacing ff for storing large data objects.
https://ohdsi.github.io/Andromeda/
11 stars 9 forks source link

arrow version: on Windows, delete temp files after closing Andromeda #46

Open schuemie opened 1 year ago

schuemie commented 1 year ago

Currently, on Windows, after having used batchApply() or groupApply(), the Andromeda temp files are not deleted when closing the Andromeda object because some lock is maintained. This could cause serious problems when we iterate over many Andromeda objects (like we do in several HADES packages), and we could run out of disc space.

At some later stage the files may be deleted (after a garbage collect). We could keep track of undeleted files, and attempt to delete them at various point in time.

ablack3 commented 1 year ago

I implemented a strategy of trying to delete all unclosed andromeda every time close is called. I don't have a good way to test it. I think @egillax had a loop he was using to cause the issue. Maybe that could be used as a test on windows.

Here is the solution I came up with. https://github.com/OHDSI/Andromeda/blob/09d73a016aa50cd873483110c7319dfcc552228e/R/Object.R#L333

msuchard commented 1 year ago

@ablack3 -- I am little concerned about this approach. What happens if multiple R sessions are executing simultaneously (CohortMethod often opens multiple sessions)? Will this inadvertently delete Andromeda temp files for concurrent executions?

ablack3 commented 1 year ago

Thanks for checking my work @msuchard! I definitely need (at least) one more pair of eyes on my code. I don't think multiple sessions will cause an issue because each andromeda object has a unique file path to the folder where it store's its files. Andromeda objects should not share these paths. When a user closes an Andromeda object either the folder is deleted or it is marked for deletion. I think the only danger would be that a new andromeda uses the same path as the old one.

This is how I'm creating the andromeda path for a new andromeda object https://github.com/OHDSI/Andromeda/blob/09d73a016aa50cd873483110c7319dfcc552228e/R/Object.R#L130

But I have not tested this and am not 100% sure there will be no issues.

If there is a test you can think of that I can add to address your concern please describe it and I'll try to implement it.

egillax commented 1 year ago

The loop I had does not seem to cause the issue anymore. But I'm encountering this issue in the unit tests for the arrow PLP branch in windows. I can try to create a simple test out of that.

ablack3 commented 1 year ago

@egillax Do you think your loop test is something that could be added to Andromeda unit tests? If so would you be willing to create a PR with that test?

ablack3 commented 1 year ago

Thanks for helping with this release @msuchard, @schuemie, @egillax!