LivTel / frodo-l2-pipeline

FRODOSpec L2 pipeline
1 stars 1 forks source link

frodo_red_rebin / frodo_red_reformat: add WAV* headers #18

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
WAV* keyword discussion to follow.

Original issue reported on code.google.com by robbarns...@gmail.com on 1 Nov 2011 at 9:42

GoogleCodeExporter commented 9 years ago
> Hi Rob, Iain
>
> A while ago it was requested that the L2 modify the following WAV* keys. I'm 
a bit at a loss at what to change and what to change it to.
>
> WAVCENT
> WAVDISP
> WAVRESOL
> WAVSHORT
> WAVLONG
> WAVSET
> WAVCAL
> WAVERR
>
> These will all be different on the product extensions (not primary) from your 
L1 settings.
>
> WAVCENT - central wavelength presumably? should this just be set to the newly 
calibrated wavelength at half the distance across the dispersion axis?
> WAVDISP - should I set this to my new rebinned figure?
> WAVSHORT - rebinned lowest wavelength?
> WAVLONG - rebinned highest wavelength?
> WAVSET - = WAVCENT?
> WAVCAL - the frame used?
> WAVERR - I don't know how to reasonably estimate this in an autonomous 
fashion..
>
> Thoughts?

These are our internal wavelength keywords and are only intended as an 
approximate guide. The full WCS which you create is of course the exact 
specification. In a sense they only have any real meaning in the uncalibrated 
data. Strictly speaking we cannot write a correct WCS into the raw/L1 image 
before they have been aligned and scrunched. That's why we have these vaguer 
keywords.

We have two options on how to handle them
i) set them all correctly in each extension, in which case most of your guessed 
definitions are correct
ii) include them in the primary extension only and delete them from all the 
others.

I am currently inclined towards ii. They are only really useful to us since 
third party software will not read them. They are used by the data archive 
which looks only at the primary extension, so I think if you write them in any 
other headers they will never be looked at. In the proper calibrated extensions 
they are superseded by the full WCS and no longer relevant? (That is unless 
Iain has some plan for them that I am unaware of?)

In which case, my suggestion would be to correct the values in the primary 
extension and not include them in the others. The values only need to be 
approximate and I think you can calculate what the values should be from your 
arc fit. I would just take the central fibre and derive the values for that. Of 
course the answer is different for each fibre, but only by a few Angstroms. 

First few all relate to the raw/L1 data and how they were observed:

WAVCENT - Yes. Wavelength for the centre of the CCD. Angstrom. 
WAVDISP - Dispersion. Angstrom/pixel. If we are applying only to the primary 
then it needs to be for the original pixel size in the L1/raw image. You can 
get an approx value from your arc fit.
WAVSHORT,WAVLONG - Again, assuming we only write into primary extension then it 
only applies to the L1 image, so this is approx wavelengths for pixels 1 and 
4096 (assuming an unbinned raw image). 
WAVSET - I think I would leave this alone as it is in the raw/L1 file. It is 
the requested setting so even if the real data are far off, it should stay as 
what was asked for.

These last couple could arguably be written into all extensions? They seems to 
have more relevance to calibrate than L1 data?

WAVCAL - Yes. The name of the arc file I guess. This is a bit vague. I am not 
really sure what it is to be used for, but you may as well stash that 
information. 
WAVERR - How to measure. Aha. Now there is a big question. This is the big 
remaining question we have had for a long time. Several times I have found 
files in which the calibration had clearly gone wrong and we currently have no 
way to detect of flag it automatically. Let me make a cup of tea and I will 
then try to write a bit more on this....

Original comment by robbarns...@gmail.com on 1 Nov 2011 at 9:42

GoogleCodeExporter commented 9 years ago
> WAVERR - I don't know how to reasonably estimate this in an autonomous 
fashion..
> > 
> > Thoughts?
Right. I've not been drinking tea all that time. Honest!

What we do with this seems to depend on which headers we write it into. 

If we only write in primary:
In this case I think it is fairly easy. We want an estimate of how far off are 
the values you have written. It is possible that you got the fits totally 
wrong, but that is not really relevant. That's a complete failure and the 
"error estimate" is something different. Presuambly (?) the biggest deviation 
from the WAV values is not the error in your fits at all, but rather the real 
physical variation in fibres since this is before you align and scrunch fibres. 
How about a value based on the RMS between your 144 fits? That might be the RMS 
value of your 144 "delay" shifts or the RMS in your best estimate of the 
wavelength at pixel 4096 (assuming unbinned) or some combination of them both. 
You would know better than me, but I would guess that the largest deviation is 
the fibre to fibre delay shifts which is a few pixels?

If we write this into all extensions:
The primary extension is as above. In the others which are after calibration 
this becomes a proper error estimate. As I said earlier, that's something we 
have been sorely missing anyway. As with all error estimates, it is down to you 
to figure out where you think the significant sources are. All this needs 
probably needs to be done back at the fitting stage and is not trivial.

What are the likely largest causes of errors?
* Accuracy of centroiding the arc lines themselves. Should be very good.
* Flexibility of the fit between arc lines. Presumably this can be derived 
analytically during the fitting process. The few the lines, the larger the 
possible divergence is going to be.
* Since they are all supposed to be aligned in the final data product, you 
could centroid a strong arc line in the linearized arc frame. In theory the RMS 
in the centroids ought to be zero, but we know from past experiments that when 
the fit gets unstable, the arc lines jiggle about. 
* Instead of a per-frame derived value, you could just get an estimate of our 
overall reliability and write it as a constant.

Anyway, whatever method you use to estimate the errors:
* Estimating errors is something we have not yet done convincingly in the 
pipeline and needs to be better understood irrespective of WAVERR
* I do not mean to belittle its importance and error analysis may be something 
your thesis examiners decide to lay into in great detail (I don't know!) but 
given that it is something we have not addressed yet, it does not seem to me 
that it should hold up software deployment just so you can write something in 
WAVERR. I would be inclined to not write anything until such time as you 
understand it well enough to actually write something robust and useful.
* If written into the WCS calibrated extensions there may be a better 
standardized WCS way of expressing the error than using our made up WAV 
keywords. 

Original comment by robbarns...@gmail.com on 1 Nov 2011 at 9:43

GoogleCodeExporter commented 9 years ago
On 05/04/2011 10:06 PM, Robert Smith wrote:
>> WAVERR - I don't know how to reasonably estimate this in an autonomous 
fashion..
>>
>> Thoughts?
> Right. I've not been drinking tea all that time. Honest!
>
> What we do with this seems to depend on which headers we write it into.
>
> If we only write in primary:
> In this case I think it is fairly easy. We want an estimate of how far off 
are the values you have written. It is possible that you got the fits totally 
wrong, but that is not really relevant. That's a complete failure and the 
"error estimate" is something different. Presuambly (?) the biggest deviation 
from the WAV values is not the error in your fits at all, but rather the real 
physical variation in fibres since this is before you align and scrunch fibres. 
How about a value based on the RMS between your 144 fits? That might be the RMS 
value of your 144 "delay" shifts or the RMS in your best estimate of the 
wavelength at pixel 4096 (assuming unbinned) or some combination of them both. 
You would know better than me, but I would guess that the largest deviation is 
the fibre to fibre delay shifts which is a few pixels?
>
> If we write this into all extensions:
> The primary extension is as above. In the others which are after calibration 
this becomes a proper error estimate. As I said earlier, that's something we 
have been sorely missing anyway. As with all error estimates, it is down to you 
to figure out where you think the significant sources are. All this needs 
probably needs to be done back at the fitting stage and is not trivial.
>
> What are the likely largest causes of errors?
> * Accuracy of centroiding the arc lines themselves. Should be very good.
> * Flexibility of the fit between arc lines. Presumably this can be derived 
analytically during the fitting process. The few the lines, the larger the 
possible divergence is going to be.
> * Since they are all supposed to be aligned in the final data product, you 
could centroid a strong arc line in the linearized arc frame. In theory the RMS 
in the centroids ought to be zero, but we know from past experiments that when 
the fit gets unstable, the arc lines jiggle about.
> * Instead of a per-frame derived value, you could just get an estimate of our 
overall reliability and write it as a constant.
>
> Anyway, whatever method you use to estimate the errors:
> * Estimating errors is something we have not yet done convincingly in the 
pipeline and needs to be better understood irrespective of WAVERR
> * I do not mean to belittle its importance and error analysis may be 
something your thesis examiners decide to lay into in great detail (I don't 
know!) but given that it is something we have not addressed yet, it does not 
seem to me that it should hold up software deployment just so you can write 
something in WAVERR. I would be inclined to not write anything until such time 
as you understand it well enough to actually write something robust and useful.
> * If written into the WCS calibrated extensions there may be a better 
standardized WCS way of expressing the error than using our made up WAV 
keywords.

Following my logic from the previous email, I'd suggest setting WAVERR in the 
L1 image to a predefined constant, and setting it in the L1 since it only 
really relates to the approximate fitting error. I can create my own "L2" 
prefixed key, e.g. L2WAVERR, and store the culmination of errors from L2 
calibration in that value. As you suggested, I will consider this more after 
the release.

So briefly, do you see any problems with the following?

WAVCENT
WAVDISP
WAVRESOL
WAVSHORT
WAVLONG
WAVSET
WAVERR

All set before L2.

WAVCAL removed.
WAVERR set to some appropriate constant relating to the approximate fits.

L2 doesn't touch these at all in primary, but removes them all in further 
extensions and replaces them with equivalent keys including:

L2WAVERR
L2ARC
L2WAVDIS
L2WAVCEN

pertaining to the L2 calibrations only.

Original comment by robbarns...@gmail.com on 1 Nov 2011 at 9:43

GoogleCodeExporter commented 9 years ago
>
> I think i'm more inclined towards ii) also but this leaves me with a few 
questions!
>
> I currently haven't been updating the WCS in the primary HDU with my L2 
calibrated fit. I can very easily do this, but opted not to. I see the L2 
extensions as a sequential series of reductions, and by writing the WCS 
obtained at a later stage into the primary HDU, I thought it was compromising 
the integrity of the data.

I follow your logic but I don't think I would worry about it. We have the 
original files backed up should we want to return to them. Second, this change 
would not modify any actual data. It is only correcting metadata which was 
originally written wrongly (or at least guessed at). 

> I'd argue that WAVCAL really shouldn't be there at all in the L1 image 
headers. If this data is to be kept wouldn't it better if I wrote an additional 
key for this e.g. L2ARC in only the extensions using a L2 calculated WCS fit? I 
can do this for any other data you wanted retaining, such as the L2 equivalent 
of WAVCENT and WAVDISP, but prefixing them with an "L2" extension.

I agree with that. I'm not sure what the plans for WAVCAL were. I suspect it 
was (at least in part) just Chris making guesses at what might be worth storing 
and I am sure at the time he did that, we had not decided on this multi 
extension strategy.

RJS

Original comment by robbarns...@gmail.com on 1 Nov 2011 at 9:44

GoogleCodeExporter commented 9 years ago

I think I prefer the WAV* keywords in the primary header to be correct. They 
are the ones folk are most likely to look at and currently the ones used by the 
data archive, though that could be changed.

I agree with your statement that the earlier extensions should not describe the 
later reduction products, but I don;t think that is what would happen if you 
were to update the keywords. You are simply using information derived later in 
the analysis process to go back and correct values which were previously only 
guessed at. By updating the WAV* keywords you are only making them correctly 
describe the data to which they are a header.

I don't think there is any confusion or ambiguity since these will be *_2.fits 
files. They are clearly after processing.

> > Following my logic from the previous email, I'd suggest setting WAVERR in 
the L1 image to a predefined constant, and setting it in the L1 since it only 
really relates to the approximate fitting error. I can create my own "L2" 
prefixed key, e.g. L2WAVERR, and store the culmination of errors from L2 
calibration in that value. As you suggested, I will consider this more after 
the release.
> > 
> > So briefly, do you see any problems with the following?
> > 
> > WAVCENT
> > WAVDISP
> > WAVRESOL
> > WAVSHORT
> > WAVLONG
> > WAVSET
> > WAVERR
> > 
> > All set before L2.
> > 
> > WAVCAL removed.
> > WAVERR set to some appropriate constant relating to the approximate fits.
That is all fine except that it would be my vote to update the values to the 
best available information, not leave them them as written by the ICS.

> > L2 doesn't touch these at all in primary, but removes them all in further 
extensions and replaces them with equivalent keys including:
> > 
> > L2WAVERR
> > L2ARC
> > L2WAVDIS
> > L2WAVCEN
> > pertaining to the L2 calibrations only.

L2ARC is a good idea. 

L2WAVERR is a good idea, but we ought to also look into whether there is a 
proper WCS standard format. I'll do that.

The other two are probaly of questionable use. They certainly do no harm if you 
want to include them, but don't they just replicate the WCS headers?

Original comment by robbarns...@gmail.com on 1 Nov 2011 at 9:44

GoogleCodeExporter commented 9 years ago

On 5 May 2011, at 17:04, Robert Smith wrote:

>>
>> I think i'm more inclined towards ii) also but this leaves me with a few 
questions!
>>
>> I currently haven't been updating the WCS in the primary HDU with my L2 
calibrated fit. I can very easily do this, but opted not to. I see the L2 
extensions as a sequential series of reductions, and by writing the WCS 
obtained at a later stage into the primary HDU, I thought it was compromising 
the integrity of the data.
>
> I follow your logic but I don't think I would worry about it. We have the 
original files backed up should we want to return to them. Second, this change 
would not modify any actual data. It is only correcting metadata which was 
originally written wrongly (or at least guessed at). 
>
>> I'd argue that WAVCAL really shouldn't be there at all in the L1 image 
headers. If this data is to be kept wouldn't it better if I wrote an additional 
key for this e.g. L2ARC in only the extensions using a L2 calculated WCS fit? I 
can do this for any other data you wanted retaining, such as the L2 equivalent 
of WAVCENT and WAVDISP, but prefixing them with an "L2" extension.
>
> I agree with that. I'm not sure what the plans for WAVCAL were. I suspect it 
was (at least in part) just Chris making guesses at what might be worth storing 
and I am sure at the time he did that, we had not decided on this multi 
extension strategy.
>
> RJS
>
>

This is outlined in Fault log comment number 21 of bug 1279...

http://telescope.livjm.ac.uk/Fault/Bugzilla/show_bug.cgi?id=1279

Original comment by robbarns...@gmail.com on 1 Nov 2011 at 9:45