NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.16k stars 622 forks source link

Pythonic FITS reader #1273

Open mrocklin opened 5 years ago

mrocklin commented 5 years ago

I'm copying over an issue from https://github.com/rapidsai/cudf/issues/2821 by @profjsb . Hopefully this is in-scope for DALI.

This is a request for a GPU FITS reader. Such a reader will be a welcomed and critical component as the community starts to transition data pipelines from CPU- to GPU-centric workflows.

The common image exchange format in astronomy is FITS (Flexible Image Transport System) and there are well-supported CPU-centric packages for reading (and writing) FITS, such as PyFITS (https://pythonhosted.org/pyfits/) and astropy.io (https://docs.astropy.org/en/stable/io/fits/). As part of many data pipelines, it is common to read FITS files from disk, combine and manipulate the images/spectra (as operations on numpy arrays, e.g.), and then write the results back to disk. The reduction pipeline pypeit (https://github.com/pypeit/PypeIt/tree/master/pypeit) is a good example package to see the end-to-end manipulation of FITS files for science.

With the relatively recent introduction of neural network-based steps for astronomical image processing (e.g., we have a package called deepCR, https://github.com/profjsb/deepCR, https://arxiv.org/abs/1907.09500) the best practice when wanting to use GPUs currently is to read FITS data from disk, push the data to a GPU Tensor in pytorch, apply machine learning models, then convert the Tensor back to a CPU-based numpy array. This roundtrip adds overhead. We'd like to be able to read FITS files directly to a GPU Tensor in pytorch (and the like). Of course writing FITS files directly from GPU Tensors would be a next step.

If a FITS reader is developed that can easily lead to the construction of a tensor variable on the GPU, this will open up our community to develop entirely GPU-based image processing pipelines. Much of our manipulations on images are very amenable to the massive parallelism afforded by GPUs. As someone leading an astronomy-meets-machine-learning group at UC Berkeley, I'm personally excited about this as we start to make use of GPU-based clusters, such as the new "Perlmutter" system at NERSC (https://www.nersc.gov/systems/perlmutter/).

cc @profjsb @datametrician @jakirkham

awolant commented 5 years ago

Hi, thanks for the question.

We will look into this.

rcthomas commented 3 years ago

Hi there, I was wondering if there might be a status update on this issue?

JanuszL commented 3 years ago

Hi @rcthomas,

I'm sorry to say but this is not on our mid-term roadmap. However, if there is anyone in the community willing to contribute such functionality we would be more than happy to support such an effort.

JanuszL commented 10 months ago

Hi @mrocklin,

There is an experimental support added, please check the fits reader that is available now.