Open jcf2 opened 1 month ago
Note: the original approach was apparently to use the host system's gunzip
% [status, result] = system(['gunzip ' my_image]);
and there deletion of the original .gz would be the default (overridable by --keep
). So that explains where the "worst case assumption" above comes from...
(oops, didn't mean to close this prematurely above... doh)
The above PR is designed to address efficiency related to zip with a minimal change -- just avoid an expensive and unnecessary call to gzip
in fmri_data
if the target already exists.
A separate issue that could be fixed by a deeper change is that fmri_data
will fail on a readable .nii.gz if the directory is not writable (with a an obscure "internal error has occurred" originating from MATLAB gunzip). A fix that covers that as well could be to change the unzip structure:
gunzip_temp = tempname;
gunzip(<target>, gunzip_temp);
Though in a way conceptually simpler, this approach would affect gunzip_image_names_if_gz
as well as fmri_data
(thus possibly affecting code beyond fmri_data
), and would require passing the gunzip_temp location. Presumably in fmri_data
it would also call for some renaming/restructuring, as re_zip_images
would no longer be an accurate name.
Currently fmri_data will uncompress .nii.gz inputs automatically using gunzip_image_names_if_gz:
[image_names, was_gzipped] = gunzip_image_names_if_gz(image_names);
This allows it to read .nii files. Depending on MATLAB
gunzip
and whether it deletes the original, this would result in either two files (.nii and .nii.gz) or just the .nii version. As housekeeping,fmri_data
apparently assumes the latter as the worst case and rezips:where
re_zip_images
is a subfunction offmri_data
that calls MATLAB'sgzip
. Thisgzip
call overwrites a .nii.gz if it exists, andre_zip_images
then removes the .nii:This approach is unfortunately extremely costly, as
gzip
may take around 10x as much time asgunzip
, and would seem to be totally unnecessary if the .nii.gz was in fact not deleted bygunzip
. Also, it would seem that a replacement of .nii.gz files may invalidate cryptographic hashes calculated on the original.So, one simple improvement would be to make the "housekeeping" step conditional, and if a .nii.gz exists remove the .nii instead.
In fact, the documentation for MATLAB's
gzip
(R2024a) saysand this seems to be true as well for past versions, so while OK to have that conditional "just in case" it seems rezip should be 100% avoidable here?