JuliaAstro / FITSIO.jl

Flexible Image Transport System (FITS) file support for Julia
http://juliaastro.org/FITSIO.jl/
MIT License
55 stars 29 forks source link

Key not found when reading from binary table #33

Closed rgiordan closed 9 years ago

rgiordan commented 9 years ago

I'm having trouble reading a column from a binary table. The file is from this URL. I'm new to FITS, so apologies in advance if I'm missing something basic.

I would like to read the RROWS column. It is present:

psf_filename = "psField-003900-6-0269.fit"
psf_fits = FITS(psf_filename);
read_header(psf_fits[b + 1])
# ...other fields...
# TFORM7  =                 '1J' / PIXDATATYPE
# TTYPE7  =              'RTYPE' / type
# TFORM8  =             '1PE(0)' / FLOAT
# TTYPE8  =              'RROWS' / rows_fl32
# TFORM9  =                 '1J' / INT
# TTYPE9  =              'RROW0' / row0
# TFORM10 =                 '1J' / INT
# TTYPE10 =              'RCOL0' / col0
# ....

And I can read other fields:

read(psf_fits[b + 1], "RNROW")
#4-element Array{Int32,1}:
#  51
#  51
#  51
#  51

But I cannot read RROWS:

read(psf_fits[b + 1], "RROWS")
# ERROR: key not found: -42
#  in read at /home/rgiordan/.julia/v0.3/FITSIO/src/hdutypes.jl:72

Any idea what might be going wrong?

kbarbary commented 9 years ago

Thanks for reporting.

It looks like this column of the table holds a "variable length array". Unfortunately, support for variable length arrays has not yet been implemented in FITSIO.jl. But this is good motivation to do so! I'll try to look into it soon.

rgiordan commented 9 years ago

Thanks for the quick response. Is there a low-level workaround in the meantime?

kbarbary commented 9 years ago

I pushed a very small change to the var-arrays branch on my fork that allows reading variable length arrays in the high-level interface. The commit is here: https://github.com/kbarbary/FITSIO.jl/commit/4275faef4e5757ca7589ace9860e9e16342c62ce

However, I'd like to test this before merging into master to make sure it's correct. If you could test it out, that would also be helpful.

Edit: I don't actually think that fix will allow one to properly read the column.

rgiordan commented 9 years ago

Yeah, it no longer gives a missing column error, but I get four zeros:

read(psf_fits[b + 1], "RROWS")
# 4-element Array{Float32,1}:
#  0.0
#  0.0
#  0.0
#  0.0

The PCOUNT variable is 42824 and the type is E, so if I understand the variable length array formatting correctly I'd expect 42824 / 4 = 10706 floats.

kbarbary commented 9 years ago

Yeah I see the same thing. Turns out you can only read one row at a time with fits_read_col for a binary table variable length array column. We have to use fits_read_descriptll to get the length of the array in each row, and it doesn't have a wrapper in the low-level interface yet.

If you're really desperate to hack something together immediately, you can use the ccall and the raw handle to the shared library like so:

using FITSIO
import FITSIO.Libcfitsio: libcfitsio

ccall(("fits_read_descriptll", libcfitsio), ...)

PS: you and @jeff-regier are in Berkeley right? We should meet up some time to discuss what you're working on with Julia. I'd be interested to hear about it.

rgiordan commented 9 years ago

Sorry to be a pest, but what's the low-level function to read a certain amount of data off the heap? If I am reading the docs right, fits_read_descriptll just gives the size of the field, which I think we know from PCOUNT. (Though it's very likely I've just misunderstood.)

Yes, we're at Berkeley, and we should definitely meet up. (Now there's an email thread about this.)

kbarbary commented 9 years ago

I think PCOUNT gives the entire size of the heap, which would include all the rows from multiple columns. fits_read_descriptll gives the size of the array for a given column and row (since every row can have a different-length column). For a given column and row, once you know the size of the array, then you can read it with fits_read_col, specifying the number of elements to read.

So basically my plan for implementing reading a variable length column is to

rgiordan commented 9 years ago

I sent a pull request with a wrapper fits_read_descriptll, and its output makes sense. I'm not sure I see how to use fits_read_col, though -- I'm not sure what the right values of the "firstrow" and "firstelem" arguments should be, and the sensible-seeming things I try give errors. Can you give some pointers?

psf_filename = "psField-003900-6-0269.fit"
psf_fits = FITS(psf_filename);
hdu = psf_fits[2]

colname = "RROWS"

# Move to the second header.
FITSIO.Libcfitsio.fits_movabs_hdu(hdu.fitsfile, hdu.ext)

nrows = FITSIO.Libcfitsio.fits_get_num_rowsll(hdu.fitsfile)
colnum = FITSIO.Libcfitsio.fits_get_colnum(hdu.fitsfile, colname)

rrows_dict = Dict()
for rownum in 1:nrows
    println(rownum)
    repeat, offset = FITSIO.Libcfitsio.fits_read_descriptll(hdu.fitsfile, colnum, rownum)
    println((repeat, offset))
    result = zeros(repeat[1])

    # This next line doesn't work, and I'm not sure why.  The acceptable
    # arguments for "rownum" appear to be 1:4.  Putting offset in elemnum
    # then gives the following error:
    # ERROR: bad first element number
    #  in error at error.jl:21
    #  in fits_assert_ok at /home/rgiordan/.julia/v0.3/FITSIO/src/cfitsio.jl:197
    #  in fits_read_col at /home/rgiordan/.julia/v0.3/FITSIO/src/cfitsio.jl:775
    #  in anonymous at no file:6
    FITSIO.Libcfitsio.fits_read_col(hdu.fitsfile, colnum, rownum, 1 + offset[1], result)
    rrows_dict[rownum] = result
end
kbarbary commented 9 years ago

I think firstelem should be 1, not 1 + offset[1]. (It's the index of the first element to read within the row.) Edit: within the row and within the column.

rgiordan commented 9 years ago

Ok, just using read_col(hdu.fitsfile, column, rownum, 1, result) gives me something that looks like it could be correct. So I guess offset is just ignored?

kbarbary commented 9 years ago

Closed by #36