Open airnandez opened 6 years ago
right now it doesn't seem to be advisable with the current decoder (ie: fitsio.streamDecoder
).
to implement this, we'd need to implement the fitsio.seekDecoder
that can go back to where the data (associated with the header) was when somebody actually requests the data.
one also needs to slightly modify the imageHDU
and other HDUs to handle the case where data is "lazily" loaded (or loaded just-in-time.)
I'll try to implement this on my way back to Clermont-Fd (tuesday night). unless you beat me to it :)
I have just pushed a branch lazy-image-loader
that tries to address this, at least for the Image HDU.
could you try it out and report back?
I think I must try harder to provide a better API/interface for this though...
I tested branch lazy-image-loader
and the result for me is the same: DecodeHDU()
still requires the underlying io.Reader
to read the full file contents.
I tested that by using the Reader below:
type UnbufferedReader struct {
f *os.File
count int64
}
// NewUnbufferedReader returns a new UnbufferedReader object
func NewUnbufferedReader(f *os.File) *UnbufferedReader {
return &UnbufferedReader{
f: f,
}
}
// Read implements io.Reader
func (ur *UnbufferedReader) Read(p []byte) (int, error) {
count, err := ur.f.Read(p)
if err == nil {
ur.count += int64(count)
}
return count, err
}
// Close implements io.Closer
func (ur *UnbufferedReader) Close() error {
fmt.Printf("Close: [%s] total bytes read: %d\n", ur.f.Name(), ur.count)
return ur.f.Close()
}
I call fitsio.Open()
as follows:
...
file, err := os.Open(path)
if err != nil {
return err
}
f, err := fitsio.Open(NewUnbufferedReader(file))
Passing to fitsio.Open
an analogous io.ReadSeeker
also implies reading the underlying file entirely.
Ok... It occurred to me I only implemented it for image hdu. Do you have other hdu in there?
(In any case, I'll try with your reader impls. On Monday. I am on holidays, starting now.)
The file I'm testing with is actually an image and fitsio
correctly detects that, which I know by printing the type of the header. There is a single HDU in that file.
Below is an excerpt of the FITS cards of my test file:
SIMPLE = true
BITPIX = 16
NAXIS = 2
NAXIS1 = 2144
NAXIS2 = 4241
EXTEND = false
BUNIT = ADU
BSCALE = 1
BZERO = 32768
BLANK = -32768
TIMESYS = UTC
...
ok. I've found the underlying issue.
the DecodeHDU method does correctly detect the io.Seeker
and does correctly,lazily decode the Data unit.
and after the first HDU, the number of bytes read does correspond to the number of bytes that makes up a header (14400).
it's when the decoder tries to go to the next HDU: the attempt at parsing the header iterates over all the data...
let me fix that.
ok. done.
I am still not very happy with the way I have implemented the lazy loading.
I am pondering on whether I shouldn't split the implementation of the various HDU
into a HU
and a DU
part (HU
is already factored out as Header
. we could do the same for the DU
part)
so the actual loading of the DU
part could happen when doing du := hdu.Data().(*fitsio.Table)
I just tested with branch lazy-image-loader
. When passing an io.Seeker
as the argument to fitsio.Open()
, DecodeHDU()
correctly consumes the part of the FITS file which contains the header unit and does not consume the data unit.
However, when using an io.Reader
the whole HDU is consumed even if I'm only interested in retrieving the header. I suspect this is intentional.
In both cases, this different behavior deserves to be documented, so that as a user I understand the tradeoffs of using io.Seeker
vs. io.Reader
for consuming a FITS file.
Regarding the API, it would be clearer indeed to separate the Header part from the Data part so that one can retrieve them separately. Something like:
f := fitsio.Open(...)
hdr := f.HDU(0).Header()
// actually load the data unit of this HDU
data := f.HDU(0).Data()
Wrt the API: yes, that's what I was thinking as well.
Wrt the doc: yep, that also needs to be properly documented.
Wrt the io.Reader it's indeed the intended behavior: an io.Reader can only consume blocks of data, no seeking can be achieved.
I'm evaluating
astrogo/fitsio
for scanning the header units of hundreds of thousands of FITS files. I'm not interested in the contents of the data units.I noticed that the method DecodeHDU reads not only the Header Unit but also its associated Data Unit.
This behavior is inconvenient for my use case. The structure of each one of the 800+K FITS files I need to scan is basically:
Is there any way to tell
astrogo/fitsio
not to consume (i.e. read from disk) the data unit (i.e. 18MB in this particular case) but only the header unit (i.e. 14 KB in this particular case)? Given the number and the structure of the files I need to scan, that would greatly help.