Accelerating the process of loading data to `npz`

Problem Description

My Test Condition: With my dataset which has 565 2D slices to be processed for training.

In the add_np_data function of load_data_for_cine_ME.py, the data is loaded by:

read the npz file of a sample;
create an array of a group if not created;
concatenate the sample with the group array;
repeat step 123.

While I am using the above code to pack a group of data for training, I found it to be time-expensive: concatenating 500 samples to get a group of array (before storing in *.npz) will cost around 58 mins.

Analysis

The issue above is caused by the concatenate within a for loop. As the group array becomes bigger and bigger, it will take more and more time to concatenate it with a new sample array. It is like an $O(n^2)$ algorithm.

Instead, whenever encountering this kind of "concatenate step by step" situation, a great strategy is to store all the sample first, and concatenate it after the for loop. This is more like an $O(n)$ algorithm.

Modifications

After my update, the process of loading a group of data is like:

read the npz file of a sample;
append the sample into a list;
repeat step 12 until reading all samples;
using the list of samples to concatenate.

Experimental Results

In my server, the time for loading a group of 500 2D slices (before storing in *.npz) decreases from 58 mins to less than 2 mins.

Others

Hope it helps.

DeepTag / cardiac_tagging_motion_estimation