JuliaAstro / FITSIO.jl

Flexible Image Transport System (FITS) file support for Julia
http://juliaastro.org/FITSIO.jl/
MIT License
55 stars 29 forks source link

Should table hdus implement the Tables.jl "interface"? #112

Closed ajwheeler closed 3 years ago

ajwheeler commented 5 years ago

Personally, when working with a FITS table, I generally immediately put the relevant cols into a DataFrame. This would make that easier, make it possible to simply save tables as CSV, etc.

If you think this is a good idea, I can look into doing it myself.

giordano commented 5 years ago

I think this feature could be very nice, however I'm not sure it should stay in this package, as this is mostly an interface to CFITSIO library.

ajwheeler commented 5 years ago

Do you have thoughts about where it would be appropriate? I can imagine a dedicated package, or perhaps a larger library (FITS.jl?) that depends on FITSIO.jl and exports its functions but provides more functionality on top of that.

giordano commented 5 years ago

Maybe another package? I'd avoid FITS.jl mostly because its name would clash with the FITS type here. Maybe @kbarbary could have more ideas about this issue in general :slightly_smiling_face:

kbarbary commented 5 years ago

Perhaps a good way to start would be to create dedicated package FITSTables.jl (or similar) that has just the Tables.jl interface functionality. At a later point, if it seems clear that this interface is the One True Table Interface and that FITS tables play well with it, then it could be included in FITSIO directly.

mileslucas commented 4 years ago

I think it makes sense to just treat TableHDU as a tables interface. Just because this is a wrapper for cfitsio doesn't mean we can't treat our inputs as proper julian types.

ajwheeler commented 4 years ago

Any updated thoughts on this? As far as I'm aware, Tables.jl is full steam ahead, but I'm not super up to date on these things.

kbarbary commented 4 years ago

I'm not super familiar with the Tables.jl interface, but I'm curious if it can support all the weirdness of the FITS tables standard: They're row-major. They can contain vector (and N-dimensional!) elements in each row. I believe there are compression formats that are part of the standard. There's a weird "variable length array" thing in which there's a heap area at the end of the table. Despite being stored row-major, cfitsio doesn't really provide an interface for reading by row.

These weirdnesses (mainly of the FITS standard itself, not cfitsio necessarily) are why I originally suggested a proof-of-concept. But if someone has thought about these issues and thinks it can work, then I'd say go for it!

ajwheeler commented 4 years ago

I'm neither an expert in Tables.jl nor the FITS standard, but I can confirm that on some level they are incompatible. Variable length arrays, specifically, aren't supported by Tables.jl. Multi-dimensional elements may be. That might be a good case for not doing this.

On the other hand, I think it may be reasonable for an implementation to support most instances of TableHDU, and throw an error for the ones that use the weird FITS features, regardless of what package this functionality lives in. I have super-dumb code that does this and works for most of my use cases.

My main uncertainty is whether Tables.jl is the right way to solve this problem given the trajectory of the Julia ecosystem, which I'm not really current on. (Not strictly relevant to this package, I realize, but definitely relevant to working with FITS data in Julia.) I'll look into it.

mileslucas commented 4 years ago

Tables.jl is a well-established package and there's no reason not to use it for creating a table interface. Some of the really nice things are that even we can specify our table as row major and it will handle it all for us.

By giving our tables the interface, we can keep how we currently display it/etc but a user can come along and simply DataFrame(fits_table) and it will Just Work^TM.

Theoretically, this implementation would break when a non-standard table gets passed, so I wouldn't even necessarily put the burden of checking on us. At most, perhaps throw a warning that explicitly says "This table is non-standard and can't be converted to a $(DataFrame, for example)"

ajwheeler commented 4 years ago

I'm surprised you wouldn't try to preempt calling Tables functions on "weird" TableHDUs. There are probably cases where a warning (and e.g. dropping a column) is better than throwing an error, but won't it mean that people who want to call DataFrame(fits_table) would have to dig into the Tables code to figure out why it doesn't work?

mileslucas commented 4 years ago

It’s totally fair to say my idea is naive. You’re right it’s probably better to fail with grace than die in the middle of Tables.jl

ajwheeler commented 4 years ago

I've finally done this: github.com/ajwheeler/FITSTables.jl.

It's very much a work in progress. I want to make sure it more gracefully handles the corner cases before registering it.

ajwheeler commented 3 years ago

Is there interest in this now that CFITSIO.jl exists? I've been happily using FITSTables.jl for months, although it still needs checks for edge cases.

mileslucas commented 3 years ago

Yes, I think that would be great!

mileslucas commented 3 years ago

implemented in #151