NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
4.98k stars 609 forks source link

Pythonic FITS reader #1273

Open mrocklin opened 4 years ago

mrocklin commented 4 years ago

I'm copying over an issue from https://github.com/rapidsai/cudf/issues/2821 by @profjsb . Hopefully this is in-scope for DALI.

This is a request for a GPU FITS reader. Such a reader will be a welcomed and critical component as the community starts to transition data pipelines from CPU- to GPU-centric workflows.

The common image exchange format in astronomy is FITS (Flexible Image Transport System) and there are well-supported CPU-centric packages for reading (and writing) FITS, such as PyFITS (https://pythonhosted.org/pyfits/) and astropy.io (https://docs.astropy.org/en/stable/io/fits/). As part of many data pipelines, it is common to read FITS files from disk, combine and manipulate the images/spectra (as operations on numpy arrays, e.g.), and then write the results back to disk. The reduction pipeline pypeit (https://github.com/pypeit/PypeIt/tree/master/pypeit) is a good example package to see the end-to-end manipulation of FITS files for science.

With the relatively recent introduction of neural network-based steps for astronomical image processing (e.g., we have a package called deepCR, https://github.com/profjsb/deepCR, https://arxiv.org/abs/1907.09500) the best practice when wanting to use GPUs currently is to read FITS data from disk, push the data to a GPU Tensor in pytorch, apply machine learning models, then convert the Tensor back to a CPU-based numpy array. This roundtrip adds overhead. We'd like to be able to read FITS files directly to a GPU Tensor in pytorch (and the like). Of course writing FITS files directly from GPU Tensors would be a next step.

If a FITS reader is developed that can easily lead to the construction of a tensor variable on the GPU, this will open up our community to develop entirely GPU-based image processing pipelines. Much of our manipulations on images are very amenable to the massive parallelism afforded by GPUs. As someone leading an astronomy-meets-machine-learning group at UC Berkeley, I'm personally excited about this as we start to make use of GPU-based clusters, such as the new "Perlmutter" system at NERSC (https://www.nersc.gov/systems/perlmutter/).

cc @profjsb @datametrician @jakirkham

awolant commented 4 years ago

Hi, thanks for the question.

We will look into this.

rcthomas commented 2 years ago

Hi there, I was wondering if there might be a status update on this issue?

JanuszL commented 2 years ago

Hi @rcthomas,

I'm sorry to say but this is not on our mid-term roadmap. However, if there is anyone in the community willing to contribute such functionality we would be more than happy to support such an effort.

JanuszL commented 5 months ago

Hi @mrocklin,

There is an experimental support added, please check the fits reader that is available now.