caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
28 stars 6 forks source link

Cubical transfer of gain solutions introduces 0's in corrected visibilities. #627

Closed paoloserra closed 10 months ago

paoloserra commented 4 years ago

Following the calibration run:

Running: gocubical --sol-jones G --g-clip-high 2.0 --weight-column WEIGHT --out-overwrite 1 --dist-ncpu 8 --data-time-chunk 128 --madmax-threshold 0,10 --g-time-int 120 --data-ms /home/pserra/msdir/jw00_mk64_4k-target_crosscalavg.ms --out-casa-gaintables 1 --model-list MODEL_DATA --data-column DATA --g-solvable 1 --g-clip-low 0.5 --model-ddes auto --dist-max-chunks 0 --madmax-enable 0 --dist-nworker 0 --bbc-save-to /home/pserra/output/continuum/selfcal_products/bbc-gains-2-jw00_mk64_4k-target_crosscalavg.parmdb --madmax-estimate corr --g-save-to /home/pserra/output/continuum/selfcal_products/g-gains-2-jw00_mk64_4k-target_crosscalavg.parmdb --out-plots 1 --madmax-plot 0 --out-name /home/pserra/output/continuum/selfcal_products/kh_jw00_mk64_4k-target_crosscalavg_2_cubical --montblanc-dtype float --g-type phase-diag --sol-term-iters 5,0 --g-freq-int 0 --out-mode sc

I transfer and apply the gains to the fine channel dataset with:

Running: gocubical --sol-jones G --montblanc-dtype float --dist-nworker 0 --bbc-save-to file --weight-column WEIGHT --out-mode ac --sol-term-iters 5,0 --out-plots 1 --out-overwrite 1 --data-ms /home/pserra/msdir/jw00_mk64_4k-target_crosscal.ms --g-xfer-from /home/pserra/output/continuum/selfcal_products/g-gains-2-jw00_mk64_4k-target_crosscalavg.parmdb --data-column DATA --data-time-chunk 128 --out-name /home/pserra/output/continuum/selfcal_products/jw00_mk64_4k-jw00_mk64_4k-target_crosscal.ms_2_cubical --dist-ncpu 8 --out-casa-gaintables 0 --model-ddes auto --dist-max-chunks 0

All this is using the standard MeerKATHI settings.

The resulting CORRECTED_DATA is padded with zeros both in time and frequency while DATA was not (see attached images). I don't know why. Any idea? Thanks!

data corrected

PeterKamphuis commented 4 years ago

Are you running through cubical or meerkathi? I don't even understand why a --g-xfer-from works for you when it crashes for me.

PeterKamphuis commented 4 years ago

Also, your --g-freq-int 0 means all frequencies are calibrated as a single gain but doesn't that collide with the default data-freq-chunk?

paoloserra commented 4 years ago

Are you running through cubical or meerkathi? I don't even understand why a --g-xfer-from works for you when it crashes for me.

Within meerkathi. Does it mean you've never been able to run transfer_apply_gains ?

Also, your --g-freq-int 0 means all frequencies are calibrated as a single gain but doesn't that collide with the default data-freq-chunk?

Yes, I've solved for freq-independent gains. What is the default data-freq-chunk? Maybe cubical can handle conflicting settings there?

paoloserra commented 4 years ago

BTW calibrating with a shorter --g-time-int reduced the padding along the time axis in the fine channel dataset. So I suspect that those zeros are the first and last solution interval, although I'm not sure why the visibilities should be set to zero there.

paoloserra commented 4 years ago

@PeterKamphuis

$ stimela cabs -i cubical
...
  Name         data-freq-chunk
  Description  Chunk data by this number of channels. See time-chunk for info. 0 means full frequency axis.
  Type         int
  Default      None
PeterKamphuis commented 4 years ago

Within meerkathi. Does it mean you've never been able to run transfer_apply_gains ?

No I haven't but then I have 512 channels typically. With the latest stimela it is not working, that is actually what is currently holding up the merge of the branch Two_Scale_Clean with the overhaul of Jones matrices.

Yes, I've solved for freq-independent gains. What is the default data-freq-chunk? Maybe cubical can handle conflicting settings there?

Well is None == 0? Because the cubical docs say

--data-freq-chunk=string

  | Data will be cut up into blocks containing this many channels. This limits the amount of data >processed at once. Smaller chunks allow for a smaller RAM footprint and greater parallelism, but >this sets an upper limit on the solution intervals that may be employed. Specify as an integer >number of channels, or a value with a unit (e.g. ‘128MHz’). 0 means use full frequency axis. >Default: 32.

In my branch I set the freq-chunk explicitly and I'm looking at very blocked corrected data column. But clearly from your images that issue is not in your version of cubical, I'm looking in to it but for now I'm just mostly confused.

paoloserra commented 4 years ago

@PeterKamphuis judging from the cubical log messages you are right, the default frequency chunk is 32.

In case that makes you feel better, my last run of transfer_apply_gain gave me an error. No clue why since I have not changed the way I'm running it... (all good, not sure what I had done wrong but I've rerun this multiple times and always successfully; so it would still be good to understand why xfer is not working for you @PeterKamphuis )

paoloserra commented 4 years ago

@PeterKamphuis I last pulled and built Stimela master (1.2.3) on 27 Sep. As far as I can see there's been no changes in Stimela's cubical since then.

PeterKamphuis commented 4 years ago

@paoloserra Apparently I have an issue there then cause apparently I am running version 1.2.2 installed on 11 Sep. Maybe I am running the old version? Let me update everything.

PeterKamphuis commented 4 years ago

@paoloserra @SpheMakh on github the latest release is 1.2.2 so 1.2.3 is not the master? Are you running python 2 or 3 BTW cause I'm running 3.6. Does that matter for stimela?

SpheMakh commented 4 years ago

@paoloserra @SpheMakh on github the latest release is 1.2.2 so 1.2.3 is not the master?

1.2.3 is the master version (it will be the next release). But I did not update cubical in that version

PeterKamphuis commented 4 years ago

@SpheMakh so I need to do pip install and then pull and build to get that right?

SpheMakh commented 4 years ago

@SpheMakh so I need to do pip install and then pull and build to get that right?

Yes. But pip install from github

bennahugo commented 4 years ago

If I'm not mistaken the gain-transfer patch is not yet in any cubical release: https://github.com/ratt-ru/CubiCal/pull/316. If you can confirm that this works for you it can be merged into the next release.

PeterKamphuis commented 4 years ago

@bennahugo yes but for @paoloserra it works so first figuring out why and whether this is a still existing problem. That patch does not work as it should as it does not seem to apply the gains.

bennahugo commented 4 years ago

Can you elaborate? I think what you are seeing above could be a mismatch in gain settings between the two rounds of calibration. Are you sure that your gain solutions (and chunks) are the same between the solve and apply steps?

PeterKamphuis commented 4 years ago

@bennahugo In my case the xfer-from just does not work, i.e. stimela crashes as explained in the Issue https://github.com/ratt-ru/CubiCal/issues/315. I am sure that I have used the same settings. For @paoloserra this does not seem to happen and I am trying to figure out why. I do not know whether @paoloserra has explicitly set all chunks and intervals to be the same.

paoloserra commented 4 years ago

I'm looking into that now. I think the meerkathi way of setting up cubical can be improved. I'll soon make a PR with some proposed changes.

PeterKamphuis commented 4 years ago

@paoloserra yes I have done that but waiting to solve this issue before merging that branch.

paoloserra commented 4 years ago

That's OK, but I may make some minimal changes anyway just to allow sensible settings. Will ask for your review.

paoloserra commented 4 years ago

So am I correct in thinking that time-chunk should be a multiple of time-int? And that time-int = 0 is the same as setting time-int = time-chunk?

(And the same for the frequency axis.)

PeterKamphuis commented 4 years ago

That is my understanding as well.

bennahugo commented 4 years ago

Alright I will have a look at the transfer code in cubical - my newest slope + phase selfcal chaining also use it because you cannot solve for a parameterized solution and a normal gain in the same chain unfortunately. As for the time tile size - ideally it is a multiple of the interval yes, ultimately the tile bounds the solution interval. However be careful (and consistent) with your max-chunks and min-chunks settings. Cubical by default is greedy and will adjust the tiles based on the size of the process pool, so your chunk size could be different.

SpheMakh commented 4 years ago

So am I correct in thinking that time-chunk should be a multiple of time-int? And that time-int = 0 is the same as setting time-int = time-chunk?

(And the same for the frequency axis.)

Yes. @o-smirnov can you confirm

o-smirnov commented 4 years ago

I confirm.

Cheers, Oleg


Sent from my phone. Quality of spelling inversely proportional to finger size.

On Tue, 22 Oct 2019, 21:23 Sphesihle Makhathini, notifications@github.com wrote:

So am I correct in thinking that time-chunk should be a multiple of time-int? And that time-int = 0 is the same as setting time-int = time-chunk?

(And the same for the frequency axis.)

Yes. @o-smirnov https://github.com/o-smirnov can you confirm

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ska-sa/meerkathi/issues/627?email_source=notifications&email_token=ABRLTP2OAHRKY7NM7EWOITTQP5HKLA5CNFSM4JBME3BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB64VFQ#issuecomment-545114774, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRLTP6EYXJOZZZ7PNCUVK3QP5HKLANCNFSM4JBME3BA .

paoloserra commented 4 years ago

Following https://github.com/ska-sa/meerkathi/pull/636 I have self-calibrated a 95 fat-channels .MS with;

time-chunk ........................................ = 128
freq-chunk ........................................ = 0
time-int .......................................... = 16
freq-int .......................................... = 0

So I solved for frequency-independent gains every 2m8s, with 8 independent solutions per time chunk.

I have then successfully transferred and applied the gains to a 478 fine-channels .MS with

time-chunk ........................................ = 16
freq-chunk ........................................ = 0
time-int .......................................... = 1
freq-int .......................................... = 1

where I reduced time-chunk from 128 to 16 not to run out of memory.

I've noticed that for the cubical gain-transfer run, time-int and freq-int are not copied from the calibration settings, but are left unset in MeerKATHI and take a value of 1. See https://github.com/ska-sa/meerkathi/blob/eeb5a491fc0bdf607be2fef7bc74b89264a7ae4e/meerkathi/workers/self_cal_worker.py#L1084 and following lines. I would have thought that time-int = 16 and freq-int = 0 (same as in the calibration run) would have made more sense. Can anybody comment?

Anyway, I no longer see the zero-ed edges I reported above. So that's good news. Maybe my problem happened because of a mismatch as suggested by @bennahugo . In that case, I think MeerKATHI should be improved not to allow such mismatch.

@PeterKamphuis problem with xfer remains to be solved. Hopefully it goes away with Stimela 1.2.3.

KshitijT commented 4 years ago

I would have thought that time-int = 16 and freq-int = 0 (same as in the calibration run) would have made more sense. Can anybody comment?

In general should be fine (as long as there is no time averaging involved?). Do we need to set these two parameters when transferring the gains at all, though? @bennahugo , @SpheMakh , @o-smirnov could you please confirm?

paoloserra commented 4 years ago

as long as there is no time averaging involved

in my case I only averaged in freq, and then transferred the gains to the non-freq-averaged .MS

IanHeywood commented 4 years ago

When running in apply (xfer-from) mode the time-int and freq-int will be the "resolution" at which the pre-computed gains are interpolated. I had good results with 1 and 1 for both these parameters when loading coarse channel gains onto fine channel MeerKAT-16 data, but that was a long time ago and I haven't tried it since. I would guess that ultimately the performance of this trick will depend on the SNR of your gain solutions.

I was also using ar not ac to subtract the continuum at the same time.

PeterKamphuis commented 4 years ago

@paoloserra Well the error I get must be something in my settings as the master runs fine when I transfer_apply_gains.

paoloserra commented 4 years ago

@PeterKamphuis that's good news.

paoloserra commented 4 years ago

When running in apply (xfer-from) mode the time-int and freq-int will be the "resolution" at which the pre-computed gains are interpolated.

OK, so in case I don't want to interpolate but simply copy and apply the frequency-independent gains from a fat-channel to a fine-channel .MS using 1,1 is wrong.

PeterKamphuis commented 4 years ago

@paoloserra If the interpolation is done correctly it should not make a difference right? The interpolation of a single number would be the same for all frequencies. Or am I missing something?

paoloserra commented 4 years ago

It will still interpolate in time. That might be OK, but I think we should have control over that.

PeterKamphuis commented 4 years ago

Ah yes, but the time intervals are the same between the two sets so just setting that to the int of the calculated solution should not be an issue.

paoloserra commented 4 years ago

Not sure I understand what you mean. I've solved with time-int 16. If I transfer with time-int 1 (current hardcoded behaviour in MeerKATHI) I think cubical will interpolate in time. I see no reason to do this, and anyway the user should be in control.

IanHeywood commented 4 years ago

It will still interpolate in time.

At the risk of stating the obvious, freq interpolation is I think also on by default, although I think with your settings it will not do anything as you've got a single value for the entire frequency band.

That might be OK, but I think we should have control over that.

I agree. If you're doing time/freq re-binning then these parameters also need exposing to the user, as in that mode it has to be a solve-and-save followed by a load-and-apply operation.

PeterKamphuis commented 4 years ago

@paoloserra Yes I meant to set the time-int to something else than 1. So user controlled. I can work this in on Monday with a default setting of 1,1.

paoloserra commented 4 years ago

@PeterKamphuis please check with @KshitijT as I think he was also planning to make some changes there.

KshitijT commented 4 years ago

@PeterKamphuis please check with @KshitijT as I think he was also planning to make some changes there.

@PeterKamphuis , I got this, I anyway want to add a few more things in the selfcal options - unless you want to implement some other stuff as well.

o-smirnov commented 4 years ago

Not sure I understand what you mean. I've solved with time-int 16. If I transfer with time-int 1 (current hardcoded behaviour in MeerKATHI) I think cubical will interpolate in time.

Correct.

If you want to apply exactly the same gains (i.e. "stepwise" in time/freq, same gain value over each original solution interval), then the chunks/intervals in the apply step must exactly match those at the solve step. If you're trying to transfer solutions from "fat channels" to "thin channels", you'll need to multiply freq-int by whatever the "fattening factor" is to ensure that exact match.

In this scenario, I would also recommend using load-from and not xfer-from. The difference is that load-from never interpolates, and complains if there is a mismatch in the grids. Which is I think what you want here -- if you get it wrong, you want to see an error message, not quiet behind-the-scenes interpolation.

PeterKamphuis commented 4 years ago

@o-smirnov Are you saying that load-from is still preferred when the datasets (used for calibration vs applying to) have different resolutions? I thought that would not work?

And when we just apply does that still mean the chunks need to be larger than the intervals? That could lead to memory problems when we have large intervals on averaged datasets to create the calibration and that are subsequently applied to the full data set. Or not?

o-smirnov commented 4 years ago

Are you saying that load-from is still preferred when the datasets (used for calibration vs applying to) have different resolutions? I thought that would not work?

It will work if the resolutions are exact multiples of each other, and you incorporate the multiples into your intervals accordingly (or use physical units -- i.e. seconds and MHz -- to specify intervals and chunks).

The exact logic is as follows:

  1. Split the data into chunks as per time-chunk and freq-chunk (and chunk-by)

  2. Split each chunk into intervals as per time-int and freq-int

  3. Compute the time/freq midpoint of each interval

  4. Lookup this time/freq point in the solution table:

So as long as you end up with the identical set of intervals after steps 1 and 2, load-from should work. As long as the chunks are multiples (or equal to) the original intervals, it should work. But if the chunks get smaller, then of course it stops working.

PeterKamphuis commented 4 years ago

@KshitijT Well I was working on a restore function that also does the gains. In anycase if you already have it I'll wait for that and incorporate it.

PeterKamphuis commented 4 years ago

@paoloserra @KshitijT @o-smirnov So I traced my error back to having the matrix-type set to phase-diag, which is what it was set to when creating the solutions but apparently that does not work when applying them.

paoloserra commented 4 years ago

@KshitijT as part of your changes can you please make the transfer_apply_gains settings mentioned in @o-smirnov 's posts available to users? One option would be to have something like;

transfer_apply_gains:
  enable: true
  transfer_to_label: corr
  time-chunk: 128
  freq_chunk: 0
  interpolate:
    enable: true
    time-int: 16
    freq-int: 0

where cubical runs with load-from or xfer-from depending on whether interpolate is enabled. In case it is disabled, time-int and freq-int should be automatically set to be consistent with the values used when calculating the gains.

KshitijT commented 4 years ago

Yup, will do.

PeterKamphuis commented 4 years ago

@o-smirnov I was wondering, since we do not transfer the cubical flags to the high resolution datasets should we not always use xfer-from when transferring the gains to avoid have blocks that are not self called?

o-smirnov commented 4 years ago

Good point @PeterKamphuis. If you have fully flagged time/freq intervals for some antennas in the low-res dataset, there will be no gain solutions for those intervals, and --load-from will load nothing (leaving you with a unity gain). --xfer-from will at least try to interpolate something sensible from the surrounding valid blocks.

So conceptually, if the high-res flags are allowed to be a subset of the low-res flags, then there's no way to avoid interpolation entirely... not sure what the best way forward then is. Opinions?

KshitijT commented 4 years ago

I don't know if we should use on Cubical flags transfer to high-res data (since for example it would depend on low SNR, solution intervals etc).

The original idea behind transfer-gains was to estimate gains from (possibly broadband) continuum data and apply it to the "HI" data. In cases of strong HI sources this would imply flagging them out in the "continuum" data so that the HI emission is not absorbed in the gains and then safely apply the estimated gains to the hi-res data (with HI not flagged, and not necessarily broadband - of course this depends on knowing a-priori where the HI source(s) is) and go happily to make an HI image.

I wonder what will happen in such a case if the solution interval in frequency is small enough to be comparable to the HI source width - we'll end up with unity gains if we use --load-from, right? Then I would say --xfer-from is better.