DeepTag / cardiac_tagging_motion_estimation

A deep learning-based fully unsupervised method for cardiac tagging MRI motion tracking.
51 stars 4 forks source link

Accelerating the process of loading data to `npz` #6

Closed lin-tianyu closed 4 months ago

lin-tianyu commented 6 months ago

Problem Description

My Test Condition: With my dataset which has 565 2D slices to be processed for training.

In the add_np_data function of load_data_for_cine_ME.py, the data is loaded by:

  1. read the npz file of a sample;
  2. create an array of a group if not created;
  3. concatenate the sample with the group array;
  4. repeat step 123.

While I am using the above code to pack a group of data for training, I found it to be time-expensive: concatenating 500 samples to get a group of array (before storing in *.npz) will cost around 58 mins.

Analysis

The issue above is caused by the concatenate within a for loop. As the group array becomes bigger and bigger, it will take more and more time to concatenate it with a new sample array. It is like an $O(n^2)$ algorithm.

Instead, whenever encountering this kind of "concatenate step by step" situation, a great strategy is to store all the sample first, and concatenate it after the for loop. This is more like an $O(n)$ algorithm.

Modifications

After my update, the process of loading a group of data is like:

  1. read the npz file of a sample;
  2. append the sample into a list;
  3. repeat step 12 until reading all samples;
  4. using the list of samples to concatenate.

Experimental Results

In my server, the time for loading a group of 500 2D slices (before storing in *.npz) decreases from 58 mins to less than 2 mins.

Others

Hope it helps.