Mouse-Imaging-Centre / pydpiper

Python code for flexible pipeline control
Other
24 stars 10 forks source link

Possibly make better use of (local) temporary directories for cluster submission IO? #384

Open gdevenyi opened 5 years ago

gdevenyi commented 5 years ago

Running into some issues here on our cluster where the multiple simultaneous read/write of a MAGeT.py mincresample stage is completely obliterating our disk IO for storage systems (which also supplies the IO for all the workstations...)

I was thinking there might be some places where IO could be better served by local temporary storage (preferably configurable, defaulting to /tmp)

gdevenyi commented 5 years ago

I realize this may be a big engineering ask to intercept the IO steps, do them to a local fs, and then copy the results back to the original intended location.

In the meantime we've limited the cluster submission defaults to smaller numbers so that things don't get too hairy.

bcdarwin commented 5 years ago

What are you looking for? Copying files that will be read multiple times from the same executor to that system?

gdevenyi commented 5 years ago

So, the main problem is the way that libminc does the "simultaneous" read/write of voxels when it processes files.

This stream-like processing is particularly hard on IO because its not at full-speed since its CPU bound on the program side, so we have both a read and a write stream, not at full speed, holding the IO.

So, first simple improvement would be to write to ${tmpdir}, and copy afterwards.

Bigger gains would come from copying the inputs to the local temporary directory, processing there, and copying the results back.

Again, I realize this is a big engineering ask. These features are generally a good thing (tm) in minc-land compared to NIFTI's load the whole file into ram style of processing.

bcdarwin commented 5 years ago

Wouldn't this be solved more easily by increasing buffer size?


From: Gabriel A. Devenyi notifications@github.com Sent: October 26, 2018 11:28:38 AM To: Mouse-Imaging-Centre/pydpiper Cc: Ben Darwin; Comment Subject: Re: [Mouse-Imaging-Centre/pydpiper] Possibly make better use of (local) temporary directories for cluster submission IO? (#384)

So, the main problem is the way that libminc does the "simultaneous" read/write of voxels when it processes files.

This stream-like processing is particularly hard on IO because its not at full-speed since its CPU bound on the program side, so we have both a read and a write stream, not at full speed, holding the IO.

So, first simple improvement would be to write to ${tmpdir}, and copy afterwards.

Bigger gains would come from copying the inputs to the local temporary directory, processing there, and copying the results back.

Again, I realize this is a big engineering ask. These features are generally a good thing (tm) in minc-land compared to NIFTI's load the whole file into ram style of processing.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Mouse-2DImaging-2DCentre_pydpiper_issues_384-23issuecomment-2D433447541&d=DwMFaQ&c=Sj806OTFwmuG2UO1EEDr-2uZRzm2EPz39TfVBG2Km-o&r=WbPKw40NU3g_RTKn7pWL3cSAdk6QRKr3kMreWPZzNcg&m=IbsDm9r9A4sMz3kJjzRMd7EVX1U9cgnmh5aSEecTXsU&s=EiFEA-Vy209sjWsrKciQDlqVVdh-CUV4OhcQC8VOtng&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAKBNL6LQnCc-2DbCknUeBPPjAbeUIhW8Eks5uoyomgaJpZM4X7AYL&d=DwMFaQ&c=Sj806OTFwmuG2UO1EEDr-2uZRzm2EPz39TfVBG2Km-o&r=WbPKw40NU3g_RTKn7pWL3cSAdk6QRKr3kMreWPZzNcg&m=IbsDm9r9A4sMz3kJjzRMd7EVX1U9cgnmh5aSEecTXsU&s=xuOSywSp0TOQGgKuA4QDX1US9cT4QY-Mr_6ysM0TfFg&e=.


This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.