EranOfek / AstroPack

Astronomy & Astrophysics Software Pacakge
Other
17 stars 4 forks source link

Suggested improvements to MatchSources search and read functions #498

Closed param-rekhi closed 1 month ago

param-rekhi commented 1 month ago

rdirMatchedSourcesSearch

  1. Use native Matlab dir function in place of io.files.rdir. dir can now be used recursively. This is a massive speed up (at least 10x).
  2. Sort the returned files by date & time - this can be done by sorting by filename. Example:
    List = struct2table(dir(fullfile("**",Args.FileTemplate)));
    List = sortrows(List, 'name');

    readList

Allow the function to read only specific fields. This helps with both speed and memory. This functionality is already present in read and hence is a simple fix:

Args.Fields         = [];  % read all fields
Result(Ifile)       = MatchedSources.read(File{Ifile},'Fields',Args.Fields);

PS: All the parameters in rdirMatchedSourcesSearch can be folded into the path and file template, and hence I have created a bare-bones version of the function:

path = compose("/marvin/LAST.01.%02i.%02i/%i/%02i/%02i/proc/", U.Mount,U.Camera,U.Year,U.Month,U.Day);
template = compose("*%s_000_001_%03i_sci_merged_MergedMat_1.hdf5", U.FieldID,U.CropID);

function Result = rdirMSfast(Args)
    % Recursive search for MergedMat files and return file names and paths.
    %   The output is a structure array in which each element
    %   contains the filename and path, sorted by date & time,
    %   which can be fed to MatchedSources.readList
    % Input  : * ...,key,val,...
    %            'FileTemplate' - File name template to search
    %                   Should have CropID in the template to avoid
    %                   getting all of them.
    %            'Path' - Path in which to start the recursive
    %                   search. Default is pwd.

    arguments
        Args.FileTemplate       
        Args.Path               = pwd;
    end

    PWD = pwd;
    cd(Args.Path);
    List = struct2table(dir(fullfile("**",Args.FileTemplate)));
    List = sortrows(List, 'name');

    Result.FileName = List.name;
    Result.Folder   = List.folder;            
    cd(PWD);
end
EranOfek commented 1 month ago

The MatchedSources/read function already enable you to read only specifuc datasets via the 'Fields' argument. E.g., MatchedSources.read(FileName,'Fields',{'PSF_MAG'});

Do not use read_rdir

I added a MatchedSources/readList and optional argument named 'Fields' that can be used to read only specific datasets.

[dev1 51312d6d]