JMMC-OpenDev / OI-Imaging-JRA

Design and specification of an interface to image reconstruction and model fitting from optical interferometric data
4 stars 6 forks source link

Software shouldn't change their inputs #12

Open FerreolS opened 2 years ago

FerreolS commented 2 years ago

To prevent any surprise and to simplify the tracking by OImaging, the software should update only the OUTPUT PARAM table. It can add new HDU image if needed but shouldn't change the HDUs presents in the input file.

jsy1001 commented 2 years ago

Could you be more precise as to what is required? BSMEM doesn't intentionally change any of the input tables, but it reads their contents into memory and writes out new tables containing the information it read in. Hence there may be changes such as removal of keywords that BSMEM isn't expecting.

The order of the tables is changed, but I would hope that OImaging is robust to that.

FerreolS commented 2 years ago

Yes, I forgot this case. Changing the order of keywords and table values is ok as it does not change the information content.

FerreolS commented 2 years ago

The software should add an extra column with the modeled values to the OI_VIS2, OI_VIS, OI_T3 (and OI_FLUX if needed) tables. As suggested by @emmt , the software must flag the rows that were not used (with a NaN?) rather than removing these rows.

jsy1001 commented 2 years ago

@FerreolS By "not used" do you mean excluded by USE_VIS etc., flagged (True value in the FLAG column), or something else? Why is it necessary to flag unused data in a new way if the excluded data can be inferred from the input prameters and/or FLAG columns?

FerreolS commented 2 years ago

My concern was about WISARD, that need T3 and VIS2 at the same timestamp. If for some reason there is no corresponding VIS2, the T3 will not be used . It's value should stay in the table but must be flagged somehow. Am I right @GillesDuvert ?

jsy1001 commented 2 years ago

There is a potential edge case that could apply to BSMEM. It concerns the UVMAX parameter, which isn't implemented in OImaging yet. This tells BSMEM to filter the input data at the specified UV radius.

At the moment, BSMEM applies a UV radius filter in two situations: (a) if UVMAX is specified and is smaller than the maximum UV radius in the data; (b) if the initial image pixel size is larger than \lambda_min / (6 B_max). At the moment if case (b) applies the corresponding UVMAX value is written to the input parameters HDU of the output file, potentially overwriting the original UVMAX specified by the user.

I'm not sure of the best solution here - suggestions welcome!

GillesDuvert commented 2 years ago

Hi I have mixed feelings about https://github.com/JMMC-OpenDev/OI-Imaging-JRA/issues/12#issuecomment-1040201939 above. Even if some (u,v) point was not used because V2 is missing, the reconstruction is an image that holds everywhere, so a model V2 and a model T3 is available at each u,v point. This goes also for measurements that are too noisy, WISARD drops them (error on 'myopic' complex visibilities is way too large). If the, say, observed V2, is flagged (or NULL?) it does not prevent to write a model_v2 IMHO.

GillesDuvert commented 2 years ago

But, back to the issue title, the main motive is to have the whole input image + input image header unchanged, yes? (something I would have sworn was the case for Wisard)

GillesDuvert commented 2 years ago

done.

FerreolS commented 2 years ago

At the moment, BSMEM applies a UV radius filter in two situations: (a) if UVMAX is specified and is smaller than the maximum UV radius in the data; (b) if the initial image pixel size is larger than \lambda_min / (6 B_max). At the moment if case (b) applies the corresponding UVMAX value is written to the input parameters HDU of the output file, potentially overwriting the original UVMAX specified by the user.

This case raise two concerns for me:

FerreolS commented 2 years ago

BTW it seems that on this particular UVMAX issue, the message ERROR : ** Message: 15:42:23.762: Pixel size too large for data is in the log even if the resolution is ok :

BSMEM_CI_VERSION: bsmem-v2.2.1-2021-12-16T14:08:21+00:00
cmd: "bsmem-ci /usr/local/tomcat/temp/uws/UPLOAD_1645113576690_inputfile /usr/local/tomcat/temp/uws/UPLOAD_1645113576690_inputfile.out"
ERROR : ** Message: 15:59:36.727: Pixel size too large for data
ERROR : ** Message: 15:59:36.731: Negative/zero values in prior image are not permitted. All have been replaced by 1e-08.
Bispectrum noise:   Classic elliptic approximation 
Maximum uv radius:  4.954e+07 waves
Array resolution:   2.0820 mas
Pixel size:     0.1000 mas
Image dimension:    200 pixels
Image fov:      20.00 mas
Pix/fastest fringe: 41.6396
Entropy functional: Gull-Skilling entropy
Hyperparameter scheme:  Chi2 = N method
Maximum n# iterations:  50
Initial flux:       0.010000
jsy1001 commented 2 years ago

BTW it seems that on this particular UVMAX issue, the message ERROR : ** Message: 15:42:23.762: Pixel size too large for data is in the log even if the resolution is ok :

Indeed that is a bug. I was hoping we could agree on the correct way of handling UVMAX before I fix it.

emmt commented 2 years ago

MiRA does raise an error or at least a warning if the input parameters are wrong (including the choice of the pixel size). The specified input parametes should not be modified by MiRA. Sometimes the error is not straightforward to interpret e.g. a pixel size larger than the field of view yields in an error because MiRA cannot create a 0x0 pixel image (this can be improved but being smart with all possible errors may be endless...). I can replace warnings such as:

WARNING - Current pixelsize (1 mas) is too large which will yield aliasing.  Maximum pixelsize to avoid aliasing is 0.854623 mas.

by an error if this seems preferable in spite that some people like to override given advices ;-)

FerreolS commented 2 years ago

I will add in the document that INPUT PARAM table (and INIT_IMG image) must remained unchanged. If for some reason (like the UVMAX case) the actual keyword used by the software differ, this actual value must be written int the OUTPUT PARAM table.

Is it ok for you?

If needed OImaging can easily track this change.

emmt commented 2 years ago

To avoid too many specific rules, why not just have all input parameters copied into the out parameters with their actual (unchanged or not) values?

FerreolS commented 2 years ago

As you wish, but on my opinion it will make change detection harder for human readers (there is still some people that read keyword). For OImaging it is not an issue I guess

FerreolS commented 2 years ago

adding the point software should not alter or destroy other HDUs, blindly copying them in the output file from https://github.com/JMMC-OpenDev/OI-Imaging-JRA/issues/15#issuecomment-1055547273 that is much more related to this issue

jsy1001 commented 2 years ago

I will add in the document that INPUT PARAM table (and INIT_IMG image) must remained unchanged. If for some reason (like the UVMAX case) the actual keyword used by the software differ, this actual value must be written int the OUTPUT PARAM table.

Is it ok for you?

If needed OImaging can easily track this change.

Yes this is ok. See https://gitlab.com/jsy1001/bsmem/-/issues/10

buthanoid commented 2 years ago

The benefit of not modifying the input parameters is traceability. Using OImaging or not, it allows you to remember what you gave to the software to build the output file you are reading. The drawback is that placing an input parameter in the output table is misleading. Firstly it can make the user think that it was not used as input (whereas it actually was), secondly the parameter may only make sense as input so seeing it as output can be confusing.

Concerning tracing if the software modified the input parameters or not, OImaging can do it as long as the new value of the parameter is somewhere in the output file. Of course, if both old and new values are in the output file, it is easier for OImaging.

The current spec already ask the softwares to copy the input parameters table in the output file. However, it also ask the software to complete it with missing keywords that the software needs.

I reckon that sometimes the input image is modified.

My (current) point of view is that is is more simple if the software modifies the input parameters:

In any case, the user must be correctly informed when a parameter has been changed. He must not believe that the software used exactly what he gave as input.

jsy1001 commented 2 years ago

@buthanoid So are you proposing that when the reconstruction software modifies an explicit input parameter, both the input and used values should go in the input parameters table?

If so, how do we store both values? I was thinking of appending something to the parameter name (e.g. keywords [param] and [param]-OLD), but we will run into the 8 character limit for keywords (unless we use the hierarchical keyword convention).

buthanoid commented 2 years ago

Not exactly, I was proposing that the software should only store the value it actually used. So old values are lost. With a bit of work OImaging could retrieve them. note: I find the others solutions good too.

jsy1001 commented 2 years ago

Hi I have mixed feelings about #12 (comment) above. Even if some (u,v) point was not used because V2 is missing, the reconstruction is an image that holds everywhere, so a model V2 and a model T3 is available at each u,v point. This goes also for measurements that are too noisy, WISARD drops them (error on 'myopic' complex visibilities is way too large). If the, say, observed V2, is flagged (or NULL?) it does not prevent to write a model_v2 IMHO.

At the moment bsmem does not follow this suggestion. The behaviour is documented in docs/common_interface.md, which reads:

A copy of the input data is written to the output file. Only the tables containing selected data are copied. Within the tables, unselected data are replaced by NULL values. Model data, corresponding to the selected input data only, are written to the output file using the additional columns defined in Table 4 of the interface document.

This means that using the output as input to another reconstruction will only work as expected if the data selection parameters (TARGET, USE_*, UV_MAX) are unchanged. Changing bsmem to write all of the input data (whether selected or not) and model values would involve a fair amount of work.

What do the other algorithms do?

FerreolS commented 2 years ago

This issue was also discussed in #5 : In this issue @emmt said :

Ok that seems to be a better idea to keep all input data.

For the model of unused data, using NULL is the more logical (and then omitting the column if all elements are NULL). Using the current image it is however possible to compute a model for all input data. It could then be an option to do that. On output, there could be an optional column of flags indicating for each data whether: data was used and model provided, data was not used but model provided, and neither data was used nor model provided. I just mention this possibility for completeness but it may add to much complexity for the image reconstruction software. So your solution has my personnal preference as it is consistent and the simplest (we could add the optional column of flags later if it appears to be really needed).