Question about speed - Githubissues

dickinson-lab commented 4 years ago

Greetings Dylan,

I'm noticing something about performance when loading regions of stacks that puzzles me, and I'd appreciate your feedback when your time allows.

I've got a large (800 x 1200 x 500) TIFFstack object and want to load n small regions of interest into memory - these ROIs encompass just a few pixels in XY but the entire stack in Z. For purposes of demonstration I used n=3 but in reality it could be anywhere from 10 to a few hundred. I ran the following simple timing tests:

% Load just the desired regions
tic
spotStack1 = stackObj(1:10,1:10,:);
spotStack2 = stackObj(21:30,21:30,:);
spotStack3 = stackObj(31:40,31:40,:);
toc
Elapsed time is 23.025320 seconds.

tic
% Load the whole stack first
memStack = stackObj(:,:,:);
spotStack1 = memStack(1:10,1:10,:);
spotStack2 = memStack(21:30,21:30,:);
spotStack3 = memStack(31:40,31:40,:);
toc
Elapsed time is 14.418438 seconds.

Loading just the desired regions obviously uses (much) less memory, but as you can see, loading the whole stack (and then slicing it) is actually faster. The difference is not huge when n=3 but gets really significant when n is larger.

In further tests I observed that memStack = stackObj(:,:,:); takes exactly the same amount of time as spotStack = stackObj(1:10,1:10,:);

This last observation really puzzles me. spotStack is 10 kb in size; memStack is 960 MB. Naively I'd have thought that reading a much smaller amount of data from disk would be faster, but it's not. Unless TIFFStack is actually reading the whole TIFF file from disk regardless, but just not copying the whole thing into memory?

Do you have any ideas about why this is happening and whether performance when loading small regions could be improved? Thanks for any input.

DylanMuir commented 4 years ago

Hi, Thanks for your message. The differences in timing have to do with the way data is stored in a TIFF file. These files are stored as a series of frames, linked together in a linked list. Basically, you have to read at least the header of every frame, then skip to a few pieces of interest, then skip through the file to the next frame. Seeking through a file takes a fixed amount of time, whereas reading continuous bytes is very quick. So then reading a small portion of every frame takes about the same amount of time as reading the entirety of every frame.

In the past when I’ve dealt with large deep stacks of data, in terms of thousands of frames, but which need to be analysed a small pixel regions of interest, I’ve suggested to people that they transpose the TIFF stack so that each “frame” is X – T, rather than X – Y.

This means that when you need to read a small ROI, but every frame, you can be more efficient in accessing only a small portion of the stack data.

This only helps if you need to run your analysis several times, since transposing the stack itself takes time. But generally I‘ve found that analyses need to be run multiple times!

I hope that helps. All the best, Dylan.

On 8 Aug 2020, at 12:52 am, dannyhmg notifications@github.com wrote:

Greetings Dylan,

I'm noticing something about performance when loading regions of stacks that puzzles me, and I'd appreciate your feedback when your time allows.

I've got a large (800 x 1200 x 500) TIFFstack object and want to load n small regions of interest into memory - these ROIs encompass just a few pixels in XY but the entire stack in Z. For purposes of demonstration I used n=3 but in reality it could be anywhere from 10 to a few hundred. I ran the following simple timing tests:

% Load just the desired regions tic spotStack1 = stackObj(1:10,1:10,:); spotStack2 = stackObj(21:30,21:30,:); spotStack3 = stackObj(31:40,31:40,:); toc Elapsed time is 23.025320 seconds.

tic % Load the whole stack first memStack = stackObj(:,:,:); spotStack1 = memStack(1:10,1:10,:); spotStack2 = memStack(21:30,21:30,:); spotStack3 = memStack(31:40,31:40,:); toc Elapsed time is 14.418438 seconds. Loading just the desired regions obviously uses (much) less memory, but as you can see, loading the whole stack (and then slicing it) is actually faster. The difference is not huge when n=3 but gets really significant when n is larger.

In further tests I observed that memStack = stackObj(:,:,:); takes exactly the same amount of time as spotStack = stackObj(1:10,1:10,:);

This last observation really puzzles me. spotStack is 10 kb in size; memStack is 960 MB. Naively I'd have thought that reading a much smaller amount of data from disk would be faster, but it's not. Unless TIFFStack is actually reading the whole TIFF file from disk regardless, but just not copying the whole thing into memory?

Do you have any ideas about why this is happening and whether performance when loading small regions could be improved? Thanks for any input.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

dickinson-lab commented 4 years ago

Thank you Dylan, this is an excellent suggestion. To clarify, it wouldn't work if I just call transpose() on the TIFFStack object, would it? That seems too easy. Instead I would need to write the TIFF file to disk with the dimensions transposed, correct? Do you know of any tools that can efficiently transpose a TIFF file on disk, or would I need to write my own?

DylanMuir commented 4 years ago

You’re correct, simple transpose() won’t work. You can use TIFFStack for this; just create a new stack with the required dimensions, then read in the old stack and write it back out!

On 10 Aug 2020, at 15:25, dannyhmg notifications@github.com wrote:

Thank you Dylan, this is an excellent suggestion. To clarify, it wouldn't work if I just call transpose() on the TIFFStack object, would it? That seems too easy. Instead I would need to write the TIFF file to disk with the dimensions transposed, correct? Do you know of any tools that can efficiently transpose a TIFF file on disk, or would I need to write my own?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DylanMuir/TIFFStack/issues/35#issuecomment-671352159, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKZP3FYJUM3VJVEWAQ2KJTR77YK3ANCNFSM4PYCPHOQ.

dickinson-lab commented 4 years ago

Got it. Many thanks for your help! Dan

DylanMuir / TIFFStack

Question about speed #35