MWATelescope / mwa_hyperdrive

Calibration software for the Murchison Widefield Array (MWA) radio telescope
https://MWATelescope.github.io/mwa_hyperdrive
Mozilla Public License 2.0
85 stars 4 forks source link

Feature request: write simulated visibilities to MODEL_DATA in ms to be used for other calibration softwares #35

Open devojyoti96 opened 1 month ago

devojyoti96 commented 1 month ago

Will it be possible to add a feature to save simulated visibilities to an input ms MODEL_DATA column. Right now one needs to create a simulated ms and then copy its DATA column to the original ms MODEL_DATA column. If this feature could be added, it will help using other calibration software whenever needed, but visibility modeling still can be done using hyperdrive

d3v-null commented 1 month ago

Thanks @devojyoti96 , can you explain in some more detail what software uses the MODEL_DATA column and how it's used? This will help us test this feature if it is eventually developed. It would also be good to explore how the command line interface would work for this. e.g. would the following work?

hyperdrive vis-simulate <existing.ms> --model-data-column=MODEL_DATA

We are currently working on measurement set io optimization at the moment, so this feature may have to wait until after that.

Also tagging this releated issue https://github.com/MWATelescope/mwa_hyperdrive/issues/3

devojyoti96 commented 1 month ago

@d3v-null thanks. I am simply thinking about the very standard software CASA. CASA calibration tasks read the DATA column as observed visibilities and MODEL_DATA column as visibilities of the sky model and solve the measurement set equation. The command line you suggested, I believe, will work.

As well as this feature, hyperdrive di-calibrate should also have the capability of reading model visibilities from the MODEL_DATA and performing the calibration. This will allow hyperdrive to be used for self-calibration as well. If one uses CASA or WSCLEAN for imaging, the model image is Fourier transformed and written to the MODEL_DATA column of ms. If di-calibrate has the capability to use the model visibilities stored by CASA and WSCLEAN imaging in MODEL_DATA, one can do self-calibration.

d3v-null commented 1 month ago

Thanks @devojyoti96 I think we've touched on two things here:

di-calibrate reading from MODEL_DATA has clear value, in that you can apply hyperdrive's fast di-cal for visibilities simulated with other tools like WODEN or WSCLEAN.

As you've said, writing to MODEL_DATA would allow you to use CASA for calibration. I'd just like to know what value there is in doing the calibration in CASA when you could do it in hyperdrive. Does CASA offer a different algorithm to hyperdrive's antsol-based DI cal? Does CASA give you more control over calibration parameters? Does CASA write out solutions in a more convenient format?

Thanks

devojyoti96 commented 1 month ago

@d3v-null, I just give CASA an example. Different calibration software has some advantages and disadvantages, and if this MODEL_DATA feature is there, users can use whatever calibration routine they want and make use of the advantages of different software.

What CASA and some other software, like Quartical, offer is parameterized Jones term solvers, while in hyperdrive it is always a 2x2 complex matrix. One may have an expression that solving for a 2x2 complex matrix gives us uniquely determined antenna Jones terms, however, that is not the case. Because the norm that is minimized in antsol is Frobenius (2nd order norm), which is invariant under any unitary transformation. One such unitary transformation is crosshand phase Jones. One way to break this invariant unitary matrix is to explicitly put the Jones matrix in that form with a free parameter, and solve for that free parameter (in this case [e^(i\theta) 0, 0 1] and \theta is the free parameter, crosshand phase). This is just an example, similarly, ellipticity error and absolute polarization angle, are two examples. Now CASA and quartical provide these parameterized solvers to use. So, that is the advantage. There are several other flexibilities in CASA and quartical, for e.g., different solution and spectral intervals for different Jones terms, like offdiagonal terms generally have low SNR and can have better solutions if averaged over longer time and spectral chunk, but keeping the diagonal term solutions at smaller chunks. Not having the capability to break the Jones matrix into the Jones chain will not offer this flexibility, one needs to use the same temporal and spectral chunk for both diagonal and off-diagonal terms.

Also, CASA and quartical provide refant options and make the interpolation of solutions across time easy. Now, many of these may not be required for all science purposes, but having the flexibility to store simulated models in MODEL_DATA will provide the user the freedom. That is the reason behind this feature request.

About writing the solution format, I do not think that is an issue. And fits format is anyway more convenient.

Thanks

d3v-null commented 1 month ago

Thanks @devojyoti96 , that's really useful to know.

I won't have time to implement this for a while, but pull requests are welcome.

I tried writing a workaround for you, but couldn't come up with something elegant because:

but if one of these features were implemented, then the following would work.

export obsid=1297526432
export outdir="${MYSCRATCH}/${obsid}"
mkdir -p $outdir
export srclist=${outdir}/srclist_pumav3_EoR0LoBES_EoR1pietro_CenA-GP_2023-11-07.fits
[[ $srclist =~ srclist_puma && ! -f "$srclist" ]] && wget -O $srclist "https://github.com/JLBLine/srclists/raw/master/${srclist##*/}"
export MWA_BEAM_FILE="${MWA_BEAM_FILE:=$MYSOFTWARE/mwa_full_embedded_element_pattern.h5}"
[ -f $MWA_BEAM_FILE ] || wget -O "$MWA_BEAM_FILE" $'http://ws.mwatelescope.org/static/'${$MWA_BEAM_FILE##*/}
export metafits="${outdir}/${obsid}.metafits"
[ -f "$metafits" ] || wget -O "$metafits" $'https://github.com/MWATelescope/Birli/raw/main/tests/data/1297526432_mwax/1297526432.metafits'
export raw="${outdir}/${obsid}_20210216160014_ch117_000.fits"
[ -f "$raw" ] || wget -O $raw "https://github.com/MWATelescope/Birli/raw/main/tests/data/1297526432_mwax/1297526432_20210216160014_ch117_000.fits"

# generate measurement set with only DATA column
export ms="${outdir}/${obsid}.ms"
[ -d "$ms" ] || birli -m $metafits -M $ms $raw --sel-chan-ranges 0-0 # --no-sel-autos

# simulate visibilities into a separate ms DATA column with same shape
# even though we don't care about the solutions, `vis-sim` doesn't allow us to perfectly match an input ms, so we use `di-cal --model-filenames ...` instad.
export ms_sim="${outdir}/${obsid}_sim.ms"
[ -d "$ms_sim" ] || hyperdrive di-cal --data $metafits $raw --model-filenames $ms_sim --source-list $srclist --beam-file $MWA_BEAM_FILE --num-sources 1 --use-all-timesteps

cat <<EoF > put_model_column.py
from casacore.tables import table, tablecolumn, makecoldesc, makearrcoldesc
data = table('$ms', readonly=False)
model = table('$ms_sim', readonly=True)
assert data.getcolshapestring('DATA') == model.getcolshapestring('DATA'), "DATA columns should have same shape"
print("data baselines", [*zip(data.getcol('ANTENNA1'), data.getcol('ANTENNA2'))])
print("model baselines", [*zip(model.getcol('ANTENNA1'), model.getcol('ANTENNA2'))])
assert data.nrows() == model.nrows(), "DATA columns should have same number of rows"
coldesc=makecoldesc('MODEL_DATA', model.getcoldesc('DATA'))
data.addcols(coldesc)
data.putcol('MODEL_DATA', model.getcol('DATA'))
EoF

however, $ms contains autocorrelations, and $model_ms does not.