desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
35 stars 24 forks source link

healpix group spectra memory and I/O optimizations #2290

Closed sbailey closed 1 month ago

sbailey commented 1 month ago

This PR is for a branch originally started by @akremin . It updates desi_group_spectra to filter by healpix immediately after reading each cframe, instead of collecting up all cframes and filtering at the end. This saves memory, which was a problem during the Jura run. I also updated this branch to use findfile('cframe', ..., readonly=True) for faster reads.

For context, the original implementation purposefully did not filter while reading because it was maintaining a cache of completely cframes that could be used for neighboring healpix without re-reading the same cframes multiple times. However, using that cache feature was dropped in a previous refactor, and keeping all the cframes in memory simultaneously was blowing memory during Jura.

I have verified that the outputs are the same as main, except for header keywords (timestamps, dependency versions).

We will need to do further optimizations for the densest healpix in future runs, but this branch is a step in the right direction so I'd like to merge it before proceeding with other more invasive memory management updates.