esheldon / fitsio

A python package for FITS input/output wrapping cfitsio
GNU General Public License v2.0
134 stars 58 forks source link

Add support for random groups #48

Closed bgarwood closed 8 years ago

bgarwood commented 9 years ago

Random groups is still how some radio astronomy data gets in to classic AIPS. This includes single dish data. We have some legacy c code using the cfitsio library that we'd like to migrate to python. pyfits handling of random groups appears to be similar to pyfits tables handling in that additional data can't be added to the HDU after creation.

esheldon commented 9 years ago

I looked up random groups and found this page

https://archive.stsci.edu/fits/fits_standard/node50.html

This implies reading should work as is. Can you read the files? If so, then is the issue how to write the data? Any more info would be helpful. thanks.

bgarwood commented 9 years ago

I'll try reading. I do also need to write random groups. I saw the item about groups under TODO in the readme and assumed it was completely unsupported and that that referred to random groups. I'll get back to you in a few days when I know more clearly what I can't do, with examples.

esheldon commented 9 years ago

OK, thanks. I have not explored groups at all and could use some help and advice implementing them.

bgarwood commented 9 years ago

Reading doesn't work. Presumably that's because one of the signatures of random groups is that NAXIS1=0. That NAXIS1=0 value signals that there is no primary data array. The rest of the NAXISn values describe the data array that appears in each group. A random group HDU is basically a data array with some associated parameters - i.e. it's table of scalars (the parameters) and one data array in each row (group). That's how I think of it, at least. The data array has NAXIS-1 dimensions: NAXIS2 x NAXIS3 x NAXIS4 .... NAXISn. The parameters are described by PTYPEn, PZEROn, and PSCALEn. Where PTYPEn gives the name of the nth parameter. All of these values (parameter values and data array) have the type given by the BITPIX value. PCOUNT is the number of parameters and GCOUNT the number of groups (equivalent to number of rows in a table).

esheldon commented 9 years ago

Do you think you could add some logic to fitsio that could detect this situation?

bgarwood commented 9 years ago

Probably. I haven't looked at the fitsio internals to see how much work it would take. It will be awhile before I can get to it. I have a work-around for pyfits that seems acceptable, but it's not pretty and probably much slower than if fitsio could do it. pyfits can't extend random group HDU, but I don't want to create the entire array in memory before creating the FITS file. So I create a reasonable chunk, write that to a file, close the file, open the file in fitsio (but don't read it), modify the GCOUNT header word, and close the fitsio object. Under the hood I assume that cfitsio sees the new GCOUNT value and adds zero-filled blocks to accommodate that new size. The file can then be reopened in pyfits with memmap=True and the new rows can be written to. It seems to work and be reasonably fast for modest numbers of GCOUNT value increments each time it's appended.

esheldon commented 9 years ago

You might be able to add some logic to the _get_hdu_info method

https://github.com/esheldon/fitsio/blob/master/fitsio/fitsio_pywrap.c#L527

bgarwood commented 9 years ago

fitsio is seeing a groups file currently as an empty (NAXIS1=0) Image HDU and the returned info reflects that. It at least doesn't crash and you can get at any extensions just fine. You simply can't read and write the data. I suspect it really needs a new type because I'm not sure what the read() method should do if it thinks it's an image HDU even if you set dims[0] = GCOUNT at some point. Because that only describes the DATA array on each GCOUNT row and PCOUNT is needed to get the other values from each row. So, I doubt it's a quick fix to just get read access to the values. I suspect this new type would look like the binary tables type for both reading and writing, replacing the cfitsio column i/o stuff with equivalent calls to the cfitsio groups i/o routines.

Bob

On 10/23/2015 11:30 AM, Erin Sheldon wrote:

You might be able to add some logic to the _get_hdu_info method

https://github.com/esheldon/fitsio/blob/master/fitsio/fitsio_pywrap.c#L527

— Reply to this email directly or view it on GitHub https://github.com/esheldon/fitsio/issues/48#issuecomment-150608503.Web Bug from https://github.com/notifications/beacon/AAK_sksJdjg0bpVF4lXeq60xF4baWgTxks5o-koagaJpZM4E7zHu.gif

esheldon commented 9 years ago

At least detection that this is a group could happen in get_hdu_info. It is a start

bgarwood commented 9 years ago

It's been a long time since I looked, but cfitsio support for groups is almost nil. There appear to be routines mentioned in chapter 9 of the cfitsio reference manual. But there's basically no documentation for them, so I'm not even sure what order the read/write grppar routines return the data in (parameters then data array would make the most sense, given that's how they're stored on disk).

So, I suppose minimally in get_hdu_info you'd want to go back to the commented out use of fits_read_imghdrll so that the pcount and gcount values can be retrieved and added to the info dictionary.

The more I think about this today, the more I think this is such a specialized, deprecated use (which I'm only using so we can get GBT data into AIPS, and the legacy c code we have been using has too many size limits and it's too over-reused over the years [lots of probably dead end code fragments left in the code] so that maintenance is more work than just rewriting it) that it's not worth any significant effort on your part to add this to fitsio, although I suppose adding that ability to the info might be useful to someone. That info IS available via the header so it's not like a fitsio user can't figure it out.

Bob

On 10/23/2015 12:08 PM, Erin Sheldon wrote:

OK, I'm not sure where to start. But at least /detection/ that this is a group could happen in get_hdu_info. It is a start

— Reply to this email directly or view it on GitHub https://github.com/esheldon/fitsio/issues/48#issuecomment-150619970.Web Bug from https://github.com/notifications/beacon/AAK_sozC6rtq4RI9PxZINyGW-3tsRYyiks5o-lL2gaJpZM4E7zHu.gif

esheldon commented 9 years ago

That sounds like a practical start for this. If you can generate a pull request with the fix I'll look it over.

I made a new branch "groupstart" which you can make the pull request against.

bgarwood commented 9 years ago

Okay. I'll try and get to it early next week. I have a couple of deadlines before I go away for a short trip at the end of next week so I need to work on those first, although this change should be really straightforward (and you've now gotten me hooked in deep enough that if I'm looking for a distraction I might play with how to add access to the read/write group cfitsio functions at some point).

Bob

On 10/23/2015 12:57 PM, Erin Sheldon wrote:

That sounds like a practical start for this. If you can generate a pull request with the fix I'll look it over.

I made a new branch "groupstart" which you can make the pull request against.

— Reply to this email directly or view it on GitHub https://github.com/esheldon/fitsio/issues/48#issuecomment-150632461.Web Bug from https://github.com/notifications/beacon/AAK_sqNUio1suQ28OnpP_JxLNJq-mJtaks5o-l5vgaJpZM4E7zHu.gif

esheldon commented 8 years ago

any progress on this?

bgarwood commented 8 years ago

On 07/14/2016 11:20 AM, Erin Sheldon wrote:

any progress on this?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/esheldon/fitsio/issues/48#issuecomment-232697318, or mute the thread https://github.com/notifications/unsubscribe/AAK_skznMinhQHYokYqfo-0e1i95N0Zqks5qVlPJgaJpZM4E7zHu.Web Bug from https://github.com/notifications/beacon/AAK_shlOjW73o2Tx5egc4DRrD7Dj6mhcks5qVlPJgaJpZM4E7zHu.gif

No. And it's very unlikely to happen because of the separation of the Green Bank Observatory from NRAO at end of this fiscal year. Our focus is just to patch up the holes we know about and document what's there and then wave bye. I'm moving to CASA after that.

Bob

esheldon commented 8 years ago

OK, thanks for the discussion. let's close this then