emmt / EasyFITS.jl

Using FITS files made easier for Julia
Other
9 stars 2 forks source link

Cannot write Table column with cell type String and dimension (1,) #8

Closed buthanoid closed 11 months ago

buthanoid commented 11 months ago

What is wrong

I create a FitsTableHDU and I want a column of type String with maximum string length 1. I cannot write to such a column, it fails.

What I expected

No failure

How to reproduce

julia> using EasyFITS

julia> f = openfits(mktemp()[1], "w!")
0-element FitsFile

julia> write(f, FitsHeader(), [1;;])
1-element FitsFile:
 FitsImageHDU{Int64,2}: num = 1, size = 1×1

julia> hdu = write(f, FitsTableHDU, [ :col1 => (String, (1,)) ])
FitsTableHDU (Binary): num = 2

julia> write(hdu, :col1 => ["a"])
ERROR: DimensionMismatch: bad number of dimensions
Stacktrace:
 [1] write(hdu::FitsTableHDU, pair::Pair{Symbol, Vector{String}}; case::Bool, null::Nothing, first::Int64)
   @ EasyFITS ~/Documents/EasyFITS.jl-antoine/src/tables.jl:1048
 [2] write(hdu::FitsTableHDU, pair::Pair{Symbol, Vector{String}})
   @ EasyFITS ~/Documents/EasyFITS.jl-antoine/src/tables.jl:999
 [3] top-level scope
   @ REPL[32]:1

Investigation

It works for columns of type Int:

julia> hdu = write(f, FitsTableHDU, [ :col1 => (Int, (1,)) ])
FitsTableHDU (Binary): num = 3

julia> write(hdu, :col1 => [1])
FitsTableHDU (Binary): num = 3

It works for dimensions (2,):

julia> hdu = write(f, FitsTableHDU, [ :col1 => (String, (2,)) ])
FitsTableHDU (Binary): num = 4

julia> write(hdu, :col1 => ["ab"])
FitsTableHDU (Binary): num = 4

It works for dimensions (1,1) (yes):

julia> hdu = write(f, FitsTableHDU, [ :col1 => (String, (1,1)) ])
FitsTableHDU (Binary): num = 5

julia> write(hdu, :col1 => fill("a", (1, 1)) )
FitsTableHDU (Binary): num = 5

So it is clearly a problem for the cell type String and the dimension (1,).

Looking at the function given by the error ERROR: DimensionMismatch: bad number of dimensions, we see that code (I removed some useless parts for clarity):

# Check dimensions. The number of rows is adjusted automatically.
cell_dims = read_tdim(hdu, num)
if (cell_ndims = length(cell_dims)) == 1 && Base.first(cell_dims) == 1
    cell_ndims = 0
end
vals_ndims = ndims(vals)
if eltype(vals) <: AbstractString
    off = 1 # the first dimension is to index characters and does not count here
else
    off = 0 # all leading dimensions must be identical
end
n = cell_ndims - off # number of dimension to compare
((vals_ndims == n)|(vals_ndims == n+1)) || throw(DimensionMismatch("bad number of dimensions"))
 for i in 1:n
    vals_dims[i] == cell_dims[i+off] || throw(DimensionMismatch("incompatible dimension"))
end

Now imagine we run that code, I add the values of the variables after each instruction:

julia> cell_dims = read_tdim(hdu, num)
(1,)   # we set dimension (1,) for our column, which means "scalar", CFITSIO will give us (1,)

julia> if (cell_ndims = length(cell_dims)) == 1 && Base.first(cell_dims) == 1
           cell_ndims = 0
       end
0

julia> vals_ndims = ndims(vals)
1 # indeed our input values `["a"]` has one dimension

julia> if eltype(vals) <: AbstractString
           off = 1 # the first dimension is to index characters and does not count here
       else
           off = 0 # all leading dimensions must be identical
       end
1

julia> n = cell_ndims - off # number of dimension to compare
-1

julia> ((vals_ndims == n)|(vals_ndims == n+1)) || throw(DimensionMismatch("bad number of dimensions"))
ERROR: DimensionMismatch: bad number of dimensions
# indeed  `vals_ndims` is `1` whereas `n` is `-1`

We can see that the two concepts, "not comparing the first dimension for String vals", and "(1,) means a scalar" are wrong when used together.

Here is a list of cases when cell type is Int:

Dim   vals        cell_ndims  off  n  vals_ndims  pass?
-------------------------------------------------------
(1,)  [1, 2]          0        0   0      1       yes
-------------------------------------------------------
(1,)  [1 3; 2 4]      0        0   0      2       no
-------------------------------------------------------
(2,)  [1, 2]          1        0   1      1       yes
-------------------------------------------------------
(2,)  [1 3; 2 4]      1        0   1      2       yes
-------------------------------------------------------

Now with cell type String:

Dim   vals        cell_ndims  off  n  vals_ndims  pass?
-------------------------------------------------------
(1,)  [a, b]          0        1  -1      1       no    # !!!
------------------------------------------------------
(1,)  [a c; b d]      0        1  -1      2       no
-------------------------------------------------------
(2,)  [ab, cd]        1        1   0      1       yes
-------------------------------------------------------
(2,)  [ab ef; cd gh]  1        1   0      2       no
-------------------------------------------------------
(3,2) [abc, def]      2        1   1      1       yes
-------------------------------------------------------
(3,2) m               2        1   1      2       yes
-------------------------------------------------------

m = [ abc ghi ;
      def jkl ]

How to resolve

I suggest that when cell type is String and dimensions is (1,) we put n to 0.

You can check on the previous two tables that it is correct

emmt commented 11 months ago

This should be fixed by 7f1032dab28a73998a3550b5be2993a8250273c4