Closed ajwheeler closed 3 years ago
I think this feature could be very nice, however I'm not sure it should stay in this package, as this is mostly an interface to CFITSIO library.
Do you have thoughts about where it would be appropriate? I can imagine a dedicated package, or perhaps a larger library (FITS.jl
?) that depends on FITSIO.jl
and exports its functions but provides more functionality on top of that.
Maybe another package? I'd avoid FITS.jl
mostly because its name would clash with the FITS
type here. Maybe @kbarbary could have more ideas about this issue in general :slightly_smiling_face:
Perhaps a good way to start would be to create dedicated package FITSTables.jl
(or similar) that has just the Tables.jl interface functionality. At a later point, if it seems clear that this interface is the One True Table Interface and that FITS tables play well with it, then it could be included in FITSIO directly.
I think it makes sense to just treat TableHDU
as a tables interface. Just because this is a wrapper for cfitsio
doesn't mean we can't treat our inputs as proper julian types.
Any updated thoughts on this? As far as I'm aware, Tables.jl
is full steam ahead, but I'm not super up to date on these things.
I'm not super familiar with the Tables.jl
interface, but I'm curious if it can support all the weirdness of the FITS tables standard: They're row-major. They can contain vector (and N-dimensional!) elements in each row. I believe there are compression formats that are part of the standard. There's a weird "variable length array" thing in which there's a heap area at the end of the table. Despite being stored row-major, cfitsio doesn't really provide an interface for reading by row.
These weirdnesses (mainly of the FITS standard itself, not cfitsio necessarily) are why I originally suggested a proof-of-concept. But if someone has thought about these issues and thinks it can work, then I'd say go for it!
I'm neither an expert in Tables.jl
nor the FITS standard, but I can confirm that on some level they are incompatible. Variable length arrays, specifically, aren't supported by Tables.jl
. Multi-dimensional elements may be. That might be a good case for not doing this.
On the other hand, I think it may be reasonable for an implementation to support most instances of TableHDU
, and throw an error for the ones that use the weird FITS features, regardless of what package this functionality lives in. I have super-dumb code that does this and works for most of my use cases.
My main uncertainty is whether Tables.jl
is the right way to solve this problem given the trajectory of the Julia ecosystem, which I'm not really current on. (Not strictly relevant to this package, I realize, but definitely relevant to working with FITS data in Julia.) I'll look into it.
Tables.jl
is a well-established package and there's no reason not to use it for creating a table interface. Some of the really nice things are that even we can specify our table as row major and it will handle it all for us.
By giving our tables the interface, we can keep how we currently display it/etc but a user can come along and simply DataFrame(fits_table)
and it will Just Work^TM.
Theoretically, this implementation would break when a non-standard table gets passed, so I wouldn't even necessarily put the burden of checking on us. At most, perhaps throw a warning that explicitly says "This table is non-standard and can't be converted to a $(DataFrame, for example)"
I'm surprised you wouldn't try to preempt calling Tables
functions on "weird" TableHDU
s. There are probably cases where a warning (and e.g. dropping a column) is better than throwing an error, but won't it mean that people who want to call DataFrame(fits_table)
would have to dig into the Tables
code to figure out why it doesn't work?
It’s totally fair to say my idea is naive. You’re right it’s probably better to fail with grace than die in the middle of Tables.jl
I've finally done this: github.com/ajwheeler/FITSTables.jl.
It's very much a work in progress. I want to make sure it more gracefully handles the corner cases before registering it.
Is there interest in this now that CFITSIO.jl
exists? I've been happily using FITSTables.jl
for months, although it still needs checks for edge cases.
Yes, I think that would be great!
implemented in #151
Personally, when working with a FITS table, I generally immediately put the relevant cols into a
DataFrame
. This would make that easier, make it possible to simply save tables as CSV, etc.If you think this is a good idea, I can look into doing it myself.