astropy / ccdproc

Astropy affiliated package for reducing optical/IR CCD data
https://ccdproc.readthedocs.io
BSD 3-Clause "New" or "Revised" License
89 stars 87 forks source link

ImageFileCollection seems to fail on multi extension files #423

Closed boada closed 7 years ago

boada commented 7 years ago
In [1]: ccdproc.__version__
Out[1]: '1.1.0'

I have a bunch of imaging data where each of the instrument's CCDs have been written to a different extension in the same file:

Filename: image.fits
No.    Name         Type      Cards   Dimensions   Format
0    PRIMARY     PrimaryHDU     179   ()              
1    im1         ImageHDU       179   (2112, 2048)   int32   
2    im2         ImageHDU       179   (2112, 2048)   int32   
3    im3         ImageHDU       179   (2112, 2048)   int32   
4    im4         ImageHDU       179   (2112, 2048)   int32  

Running ImageFileCollection on a directory which contains a single image gives:

In [17]: ic1 = ImageFileCollection('.', keywords=keys)
    ...: 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-85b0c52a8b28> in <module>()
----> 1 ic1 = ImageFileCollection('.', keywords=keys)

/home/boada/.local/lib/python3.5/site-packages/ccdproc/image_collection.py in __init__(self, location, keywords, info_file)
    101 
    102         if keywords:
--> 103             self.keywords = keywords
    104 
    105     @property

/home/boada/.local/lib/python3.5/site-packages/ccdproc/image_collection.py in keywords(self, keywords)
    198             # Reorder the keywords to match the initial ordering.
    199             new_keys.sort(key=keywords.index)
--> 200             self._summary_info = self._fits_summary(header_keywords=new_keys)
    201 
    202     @property

/home/boada/.local/lib/python3.5/site-packages/ccdproc/image_collection.py in _fits_summary(self, header_keywords)
    431                 continue
    432 
--> 433         summary_table = Table(summary_dict, masked=True)
    434 
    435         for column in summary_table.colnames:

/home/boada/.local/lib/python3.5/site-packages/astropy/table/table.py in __init__(self, data, masked, names, dtype, meta, copy, rows, copy_indices, **kwargs)
    369 
    370         # Finally do the real initialization
--> 371         init_func(data, names, dtype, n_cols, copy)
    372 
    373         # Whatever happens above, the masked property should be set to a boolean

/home/boada/.local/lib/python3.5/site-packages/astropy/table/table.py in _init_from_dict(self, data, names, dtype, n_cols, copy)
    668 
    669         data_list = [data[name] for name in names]
--> 670         self._init_from_list(data_list, names, dtype, n_cols, copy)
    671 
    672     def _init_from_table(self, data, names, dtype, n_cols, copy):

/home/boada/.local/lib/python3.5/site-packages/astropy/table/table.py in _init_from_list(self, data, names, dtype, n_cols, copy)
    633             cols.append(col)
    634 
--> 635         self._init_from_cols(cols)
    636 
    637     def _init_from_ndarray(self, data, names, dtype, n_cols, copy):

/home/boada/.local/lib/python3.5/site-packages/astropy/table/table.py in _init_from_cols(self, cols)
    698         if len(lengths) != 1:
    699             raise ValueError('Inconsistent data column lengths: {0}'
--> 700                              .format(lengths))
    701 
    702         # Set the table masking

ValueError: Inconsistent data column lengths: {1, 2}

However, if I break the fits files into files which only contain a single CCD's data, then everything seems to work as it should, but I seem to lose any info about what is contained in the files.

In [18]: ic1 = ImageFileCollection('.', keywords=keys) 
    ...: 

In [19]: ic1.summary
Out[19]: 
<Table masked=True length=4>
         file         obstype  object  filter extname
        str21         float64 float64 float64 float64
--------------------- ------- ------- ------- -------
image_ccd1.fits      --      --      --      --
image_ccd2.fits      --      --      --      --
image_ccd3.fits      --      --      --      --
image_ccd4.fits      --      --      --      --
boada commented 7 years ago

After a little more investigating, it seems it's something to do with the header of images. When I break the 4 ccd file into 4 single files, but write the original primary HDU to each file I still get the error.

In [47]: hdulist.info()
Filename: image_ccd1.fits
No.    Name         Type      Cards   Dimensions   Format
0    PRIMARY     PrimaryHDU     179   ()              
1    im1         ImageHDU       179   (2112, 2048)   int32   

In [48]: ic1 = ImageFileCollection('.', keywords=keys) # only keep track of keys
    ...: 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-48-85b0c52a8b28> in <module>()
----> 1 ic1 = ImageFileCollection('.', keywords=keys) # only keep track of keys

/home/boada/.local/lib/python3.5/site-packages/ccdproc/image_collection.py in __init__(self, location, keywords, info_file)
    101 
    102         if keywords:
--> 103             self.keywords = keywords
    104 
    105     @property

/home/boada/.local/lib/python3.5/site-packages/ccdproc/image_collection.py in keywords(self, keywords)
    198             # Reorder the keywords to match the initial ordering.
    199             new_keys.sort(key=keywords.index)
--> 200             self._summary_info = self._fits_summary(header_keywords=new_keys)
    201 
    202     @property

/home/boada/.local/lib/python3.5/site-packages/ccdproc/image_collection.py in _fits_summary(self, header_keywords)
    431                 continue
    432 
--> 433         summary_table = Table(summary_dict, masked=True)
    434 
    435         for column in summary_table.colnames:

/home/boada/.local/lib/python3.5/site-packages/astropy/table/table.py in __init__(self, data, masked, names, dtype, meta, copy, rows, copy_indices, **kwargs)
    369 
    370         # Finally do the real initialization
--> 371         init_func(data, names, dtype, n_cols, copy)
    372 
    373         # Whatever happens above, the masked property should be set to a boolean

/home/boada/.local/lib/python3.5/site-packages/astropy/table/table.py in _init_from_dict(self, data, names, dtype, n_cols, copy)
    668 
    669         data_list = [data[name] for name in names]
--> 670         self._init_from_list(data_list, names, dtype, n_cols, copy)
    671 
    672     def _init_from_table(self, data, names, dtype, n_cols, copy):

/home/boada/.local/lib/python3.5/site-packages/astropy/table/table.py in _init_from_list(self, data, names, dtype, n_cols, copy)
    633             cols.append(col)
    634 
--> 635         self._init_from_cols(cols)
    636 
    637     def _init_from_ndarray(self, data, names, dtype, n_cols, copy):

/home/boada/.local/lib/python3.5/site-packages/astropy/table/table.py in _init_from_cols(self, cols)
    698         if len(lengths) != 1:
    699             raise ValueError('Inconsistent data column lengths: {0}'
--> 700                              .format(lengths))
    701 
    702         # Set the table masking

ValueError: Inconsistent data column lengths: {8, 4}
boada commented 7 years ago

Here's the primary HDU with personal info removed.

SIMPLE  =                    T / File conforms to FITS standard                 
BITPIX  =                    8 / Bits per pixel (not used)                      
NAXIS   =                    0 / PHU contains no image matrix                   
EXTEND  =                    T / File contains extensions                       
NEXTEND =                    4 / Number of extensions                           
FILENAME= 'image.fits'   / Original host filename                         
OBJECT  = 'xxxxx' / Observation title                              
OBSTYPE = 'object  '           / Observation type                                                         
RADECSYS= 'FK5     '           / Default coordinate system                      
RADECEQ =                2000. / Default equinox                                                     
OBJEPOCH=                 2000 / [yr] Epoch of target coordinates               

TIMESYS = 'UTC     '           / Time system                                                           

OBSERVAT= 'KPNO    '           / Observatory                                    
TELESCOP= 'KPNO 4.0 meter telescope' / Telescope                                
TELRADEC= 'FK5     '           / Telescope coordinate system                    
TELEQUIN=                 2000 / Equinox of tel coords                                               

INSTRUME= 'NEWFIRM '           / Mosaic detector                                
MOSSIZE = '[1:4096,1:4096]'    / Mosaic detector size                           
NDETS   =                    4 / Number of detectors in mosaic                  
MSeifert04 commented 7 years ago

Thank you for the report, that seems like a severe issue with multi-extension fits files that we should resolve.

But it's really hard to debug this issue without knowing the astropy/numpy version (especially because the error happens in astropy!) and without having these files.

Would it be possible to share some of your files or "similar" files that throw the same error?

MSeifert04 commented 7 years ago

Or maybe as alternative could you please add a debug point (for example print(summary_dict) at line 432 in imagefilecollection to show what could be the trigger of this error?

Note that github supports <details> text </details> to hide long tracebacks or codes - or just put it in a gist.

boada commented 7 years ago
In [4]: astropy.__version__
Out[4]: '1.2.1'

In [5]: numpy.__version__
Out[5]: '1.11.2'

I'm not really sure how to go about sharing some files. Maybe some of the non-science data, like a dark frame. I will see if I can put something together.

mwcraig commented 7 years ago

@boada -- thanks for the additional information. One option (if it isn't too much of a hassle) would be to strip out the pointing information and replace the real data with random number. Another would be to email a file directly to me with the understanding that its content would be kept confidential.

I'll see if I can generate an error just by throwing some multi-extension fits files at ImageFileCollection.

Just to be clear about the desired behavior, is the idea that you would like, if there are multiple extensions in each file, for the generator to loop over the extensions and files? Or that you be able to extract a specific extension?

vrooje commented 7 years ago

I'm trying to reduce some spectroscopic data and just hit this issue. To answer the question above, I could see using both an all-extension-loop functionality and a single extension specification, but right now I'd like to specify the extension.

Happy to provide an example FITS file if that's helpful.

mwcraig commented 7 years ago

@vrooje -- would you want the same extension in all of the files, or would it vary from file to file? Implementing either should be fairly straightforward (I think)..

vrooje commented 7 years ago

I think I'd always be running this in batches where all the files were from the same instrument, so the same extension should work... the instrument does have separate blue and red channels but a) I've been running them separately and b) I think the file structure is still the same, just different dimensions.

janga1997 commented 7 years ago

I would like to work on this issue. So, just to be clear one last time, @crawfordsm @mwcraig , if the user specifies an extension, you should extract that extension from all files in a given directory?

janga1997 commented 7 years ago

@vrooje I'm trying to work on solving this issue. Could you provide that sample FITS file?

mwcraig commented 7 years ago

@janga1997

So, just to be clear one last time, @crawfordsm @mwcraig , if the user specifies an extension, you should extract that extension from all files in a given directory?

Yes, I would imagine adding a keyword like extension to the list of arguments, and returning the hdu/header/data/etc for that extension. The default right now is to return the first extension, I believe.

vrooje commented 7 years ago

b207_os_bs_ff_cr.fits.zip r207_os_bs_ff_cr.fits.zip

Here's a Kast observation of a standard star, one file for the blue channel and one file for the red channel. I wouldn't necessarily include these in the same collection, just making sure you have plenty of data to play with. Hope these uploaded okay...

mwcraig commented 7 years ago

@boada @vrooje -- I think the underlying issue with these files is not simply that they are multi-extension FITS files (though support for those is still needed).

In the files from @vrooje, the keyword OBSTYPE shows up twice in the primary header, once with value 'OBJECT' and again with value 1. Base on @boada's comment https://github.com/astropy/ccdproc/issues/423#issuecomment-261351015 I'm guessing the same issue occurred there, too.

How should cases like this be handled? Rather than continuing the discussion in this issue, I'm opening up a separate one: #464

crawfordsm commented 7 years ago

Closing issue, but if the fixes do not address your needs, please re-open the issue.