caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
28 stars 6 forks source link

BP/Flux cal in absence of BP/Flux scan #107

Closed paoloserra closed 10 months ago

paoloserra commented 6 years ago

An observation may be broken into multiple .h5 files and it may happen that not all files contain a BP calibrator scan. See, for example, /scratch2/paolo/fornaxa/output/meerkathi-15048*obsinfo.txt , where the last file has no BP calibrator scan.

If I got this right, in such a situation MeerKATHI is not able to BP-calibrate the last file because it always only apply calibration tables obtained from the file itself.

One solution would be to apply all calibration tables of the .ms files being processed to all files being processed (with appropriate time tolerance and interpolation/extrapolation rules).

However, at the moment, this cannot be done because we process files sequentially, running all calibration tasks on file 0, then all calibration tasks on file 1, etc. Therefore, if only the last file has a BP calibrator scan (e.g., because the calibrator wasn't up at the beginning of the observation) then all files before it cannot be BP calibrated.

SpheMakh commented 6 years ago

H5TOMS should be able to create a single MS from a list of h5 files. I'll test this and make changes to meerkathi.

Also, as you suggest, it would be nice to be able to apply calibration tables across MSs.

paoloserra commented 6 years ago

Yes that would be a good solution. We may need some extra parameters in the config file. MeerKATHI must be told that it shouldn't process the MS files with the individual IDs given by the user or returned by the archive query. Instead, it should process the concatenated file -- whatever name we decide to give it.

SpheMakh commented 6 years ago

This feature is broken in H5TOMS, we'll have to use the CASA concat task to combine the datasets instead.

paoloserra commented 6 years ago

The NGC 3621 commissioning dataset has this problem, too. It is broken in two h5 files, of which only one includes 1934 scans. The other h5 file has 0407 as primary calibrator, but MeerKATHI crashes because

The flux calibrator field "0407-658" could not be found in our database or in the CASA NRAO database

(I realise that in this case the issue is with flux calibration and not bandpass calibration, but the problem is the same.)

I wonder whether a possible solution would be to add an option called something like 'use_other_ms' for each crosscal task in the crosscal worker, which could be set to 'null' or to a valid .ms file name which contains the relevant calibration scan. This could then be passed on to the relevant CASA task.

E.g., when processing file1.ms with no bandpass calibrator, we could set bp_cal: use_other_ms: file2.ms, and this would make casa_bandpass run on file2.ms and create the usual table (i.e., with a table name as if the calibrator scan had been included in file1.ms).

ratt-priv-ci commented 6 years ago

You mean 0408-65? The only two viable options for bandpass is 0408-65 and 1934-638. It is probably labelled wrong in which case you need to manually relabel it after joining the measurement sets. Normal operating mode in the imaging script in SDP observation scripts is to observe bp gan and target. Anything else would require some manual intervention.

On Nov 21, 2017 10:40 PM, "paoloserra" notifications@github.com wrote:

The NGC 3621 commissioning dataset has this problem, too. It is broken in two h5 files, of which only one includes 1934 scans. The other h5 file has 0407 as primary calibrator, but MeerKATHI crashes because

The flux calibrator field "0407-658" could not be found in our database or in the CASA NRAO database

(I realise that in this case the issue is with flux calibration and not bandpass calibration, but the problem is the same.)

I wonder whether a possible solution would be to add an option called something like 'use_other_ms' for each crosscal task in the crosscal worker, which could be set to 'null' or to a valid .ms file name which contains the relevant calibration scan. This could then be passed on to the relevant CASA task.

E.g., when processing file1.ms with no bandpass calibrator, we could set bp_cal: use_other_ms: file2.ms, and this would make casa_bandpass run on file2.ms and create the usual table (i.e., with a table name as if the calibrator scan had been included in file1.ms).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ska-sa/meerkathi/issues/107#issuecomment-346154019, or mute the thread https://github.com/notifications/unsubscribe-auth/ARFXphp297ec3I01O3MOcIdkBnTTirO-ks5s4zU6gaJpZM4P2nSh .

ratt-priv-ci commented 6 years ago

But I agree there should be a join observation option to vjoin measurementsets.

On Nov 22, 2017 6:25 AM, "Benjamin Hugo" bhugo@ska.ac.za wrote:

You mean 0408-65? The only two viable options for bandpass is 0408-65 and 1934-638. It is probably labelled wrong in which case you need to manually relabel it after joining the measurement sets. Normal operating mode in the imaging script in SDP observation scripts is to observe bp gan and target. Anything else would require some manual intervention.

On Nov 21, 2017 10:40 PM, "paoloserra" notifications@github.com wrote:

The NGC 3621 commissioning dataset has this problem, too. It is broken in two h5 files, of which only one includes 1934 scans. The other h5 file has 0407 as primary calibrator, but MeerKATHI crashes because

The flux calibrator field "0407-658" could not be found in our database or in the CASA NRAO database

(I realise that in this case the issue is with flux calibration and not bandpass calibration, but the problem is the same.)

I wonder whether a possible solution would be to add an option called something like 'use_other_ms' for each crosscal task in the crosscal worker, which could be set to 'null' or to a valid .ms file name which contains the relevant calibration scan. This could then be passed on to the relevant CASA task.

E.g., when processing file1.ms with no bandpass calibrator, we could set bp_cal: use_other_ms: file2.ms, and this would make casa_bandpass run on file2.ms and create the usual table (i.e., with a table name as if the calibrator scan had been included in file1.ms).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ska-sa/meerkathi/issues/107#issuecomment-346154019, or mute the thread https://github.com/notifications/unsubscribe-auth/ARFXphp297ec3I01O3MOcIdkBnTTirO-ks5s4zU6gaJpZM4P2nSh .

paoloserra commented 6 years ago

OK, labelled wrong in that case, but the issue remains. There are and there will be .MS files with no bandpass/flux/delay calibrator scan, and we should be able to use calibrator scans from other .MS files. I don't think we should physically join measurement sets, there's really no need to do that. We should just let file1.ms use the calibration from file2.ms (either use existing calibration tables or derive them). Agree? If so I can give it a go.

ratt-priv-ci commented 6 years ago

https://pypi.python.org/pypi/meerkathi/0.1.0

Install instructions - will add this to the readme soon - and make it appear on the PyPI page:

virtualenv meerkathi-venv
source meerkathi-venv/bin/activate
pip install setuptools wheel pip -U
pip install meerkathi
stimela pull
stimela build
meerkathi --help

Champagne, --


Benjamin Hugo

Junior Software Developer SKA South Africa 4th Floor, The Park, Park Road, Pinelands, 7405, South Africa Contact: [+27] 0716293858 <+27%2071%20629%203858>

PhD. student, Radio Astronomy Techniques and Technologies, Department of Physics and Electronics, Rhodes University

Skype: benna.cn

SpheMakh commented 6 years ago

OK, labelled wrong in that case, but the issue remains. There are and there will be .MS files with no bandpass/flux/delay calibrator scan, and we should be able to use calibrator scans from other .MS files. I don't think we should physically join measurement sets, there's really no need to do that. We should just let file1.ms use the calibration from file2.ms (either use existing calibration tables or derive them). Agree? If so I can give it a go.

Agreed.

paoloserra commented 6 years ago

I've got this to work for all calibration steps except one, the fluxscale bootstrapping, which is where I'm stuck at the moment. Any suggestions would be appreciated.

This step is tricky:

We want to scale the GainCalibrator's gain amplitudes in table file2.G0 to the FluxCalibrator's gain amplitudes in table file1.G0.

As far as I can see CASA cannot do this because the task FLUXSCALE takes a single table as input. So we cannot give as input caltable=[file1.G0,file2.G0].

(When working on a single .MS this is not a problem because the gains from both the FluxCalibrator and the GainCalibrator are in a single table.)

I have thought of concatenating file1.G0 and file2.G0 but cannot find how to do it in CASA. One way would be to append the GainCalibrator's gains derived from file2.ms to file1.G0 by running GAINCAL with append=True. However, this is not allowed because

Appended solutions must be derived from the same MS as the existing caltable [...] (CASA docs)

Is there maybe a gain table concatenating task which I have overlooked? Any other ideas?

bennahugo commented 6 years ago

I think the safest thing to do is to join the observations into one measurement set -- use "concat" in CASA to join observations together. This is needed when the subarray changes as is often the case these days. It is very wasteful in terms of disk space - there is also a virtual concat task. I do the first to do polarization calibration when one source is down and another one is up - works well. You need to make a cab for it in Stimela though.

The altenative is to tell CASA to append solutions to the table - but then you better be sure you're using the same subarray configuration in both observations.

While we're at it we need to change the way bandpass is computed - there should be one bandpass solution (combine scans, solint 'inf') by doing gaincal to take out the DC of the bandpass before bandpass. The bandpass remains stable at -26-27 dB levels and solving for it more often like we do simply serves to increase the noise on solutions. The solution interval must be determined from the desired dB noise level on the visibilities (SEFD * 10 (dB / 10) 2 / (Sflux ** 2 x 2 x channelwidth x (nants - 1) in seconds.

bennahugo commented 6 years ago

(the instabilities - notches etc. cannot be solved with bandpass calibration - they change on timescales faster than what can be solved for, so there is no point to try and solve for them)

bennahugo commented 6 years ago

Yet an alternative is to not use fluxscaling, since it is an ugly hack any day of the week. The more correct thing is to apply bandpass, use a model for the gain calibrators (with their own spectral curvature) and solve for gains that way. This should improve the correctness of the gain average. Unfortunately @ifornax and Tony are still making models for all the gain calibrator fields.

paoloserra commented 6 years ago

Thanks @bennahugo . I actually think that joinining MS's is a waste of time and disk space. If there is an alternative we should use it. Allowing an .MS using cal tables from another .MS is standard in other software. Subarrays need to be the same of course, but a user who chooses this option will be aware of it.

Subarray changes will always have their own BP/Flux calibrator scans. The issue is when a single track (with no subarray change) is split in several files. We already have such cases, and this will only get worse as more antennas come online.

You say "tell CASA to append solutions to the table". As I mentioned, I'm having problems with it. Do you know a solution?

(Concerning the bandpass solution interval, combining scans is already possible in the pipeline. What do you propose to change?)

bennahugo commented 6 years ago

There needs to be a gain calibration step where we solve for gains prior to doing a bandpass so we can simply average the entire observation of the bandpass and average it together.

Okay if you can't append to an existing table then we need to virtualconcat the observations and try again

paoloserra commented 6 years ago

Sorry I don't understand

paoloserra commented 6 years ago

Ah, seen this https://casacore.github.io/python-casacore/casacore_tables.html#casacore.tables.msconcat

paoloserra commented 6 years ago

I think I'll explore the possibility of non-virtual table concatenation with python-casacore.

ratt-priv-ci commented 6 years ago

Hi all

I've installed a self-signed certificate for the jenkins instance running on stevie.kat.ac.za so we can use SSL for logins

Please add an exception and check that your certificate matches the following modulus

                Public-Key: (2048 bit)
                Modulus:
                    00:b3:08:78:ed:41:a9:6d:c4:0a:7b:c2:39:0a:98:
                    45:06:da:bb:0c:82:b0:bc:87:bc:ad:46:87:a0:73:
                    2b:7c:bb:99:cf:6d:31:4f:7a:1a:fd:e7:27:c5:f4:
                    9f:cc:80:5e:dd:2e:fb:b1:0d:ca:c8:21:54:cb:c7:
                    e0:75:02:34:70:c7:6c:85:47:d6:5a:3d:9e:7d:b2:
                    e0:39:99:07:f1:ad:d9:48:d9:5b:fe:ee:6c:45:6e:
                    3b:52:1e:4b:55:03:af:6d:67:fa:dc:a9:dc:49:ac:
                    d2:6c:d6:7c:53:1e:f6:68:2d:a1:c9:4b:db:cd:b7:
                    96:2c:80:30:4e:da:a5:b8:ab:dd:37:f3:3b:0f:78:
                    15:f6:0a:95:bf:e6:05:96:83:72:2b:c1:5d:d0:b6:
                    55:bb:eb:4a:6c:b6:cf:0b:44:7e:ea:3b:d1:38:3b:
                    89:fb:c5:cd:34:c7:9d:06:21:bb:5a:6e:bc:e4:20:
                    5d:d1:09:f2:c3:8c:5f:3c:3c:71:a0:68:47:2b:ff:
                    3e:5e:4c:54:e1:45:9c:9c:db:f7:c9:47:51:2d:9e:
                    ab:8c:1f:e2:6b:f1:11:30:b9:f7:47:6e:b8:6c:e3:
                    4a:2d:e5:d8:9a:24:be:e3:e0:ea:83:bc:f6:35:b5:
                    a5:9a:35:1b:bc:c3:e2:79:65:ba:69:7d:ab:e0:d5:
                    52:99
                Exponent: 65537 (0x10001)

Cheers, --


Benjamin Hugo

Junior Software Developer SKA South Africa 4th Floor, The Park, Park Road, Pinelands, 7405, South Africa Contact: [+27] 0716293858 <+27%2071%20629%203858>

PhD. student, Radio Astronomy Techniques and Technologies, Department of Physics and Electronics, Rhodes University

Skype: benna.cn

paoloserra commented 6 years ago

@bennahugo thanks for enlightening me about virtual concat. It seems like the easiest and most elegant way of solving this problem without unnecessarily writing to disk. It is also a lot less hacky than my solution (and should avoid my still-unsolved problem with the flux scale bootstrapping).

We're still not there though.

MSUtils is not able to create a .JSON file for the concatenated .MS. I guess it does not understand that this is not a "real" .MS, and that it should look for the actual tables in the original files. See error below:

File "/code/run.py", line 68, in run_func(**args) File "/usr/local/lib/python2.7/dist-packages/MSUtils/msutils.py", line 34, in summary 'FIELD' : pyrap.tables.table(msname+'/FIELD'), File "/usr/lib/python2.7/dist-packages/casacore/tables/table.py", line 363, in init Table.init(self, tabname, lockopt, opt) RuntimeError: Table /home/pserra/msdir/testconcat.ms/FIELD does not exist

I'm currently looking into combining the individual .JSON files into a concatenated .JSON file, but improving this aspect of MSUtils might be better in the long term.

bennahugo commented 6 years ago

This is because it is using /FIELD which is not the correct way to address subtables - should be "::FIELD"

bennahugo commented 6 years ago

thanks for digging this out. we will need changes in msutils. @SpheMakh is on leave and I don't have access to that repo. I'll see if I can swap things out in Stimela to build from my repo instead sometime between Christmas and new year.

paoloserra commented 6 years ago

Got it, it might be an easier fix than I thought then.

bennahugo commented 6 years ago

@paoloserra actually you do know about this right?

Cab      h5toms
Info     Convert HDF5 file(s) to MeasurementSet
Base Image       stimela/katdal:0.3.1

Parameters:
  Name         hdf5files
  Description  HDF5 file(s)
  Type         list:file
  Default      None

You can give it a list of observations to concatenate when it writes out the measurement set

paoloserra commented 6 years ago

see comment from Sphe above

This feature is broken in H5TOMS, we'll have to use the CASA concat task to combine the datasets instead.

paoloserra commented 6 years ago

(Also, this may not be a practical way forward once files get very large)

SpheMakh commented 6 years ago

This is because it is using /FIELD which is not the correct way to address subtables - should be "::FIELD"

Fixed. In msutils, to get these changes you need to pull the msutils base image and re-build the cab

docker pull stimela/msutils:0.3.1
stimela build --no-cache --us-only msutils
paoloserra commented 6 years ago

The option get_data: combine: reset is a little dangerous.

Virtualconcat moves the input .MS files inside concatenated_file.ms/SUBMSS/ . If the pipeline is run again with get_data: combine: reset: True the first thing that happens is that concatenated_file.ms is deteleted, and so are all the input .MS files previously moved to concatenated_file.ms/SUBMSS/ .

This could cause quite some pain. I suggest we change it.

I've tried the keepcopy option of CASA/VIRTUALCONCAT but it gives an error. Anyway I don't think that's a good idea because it creates a new copy of all input .MS files, which I wouldn't want.

Maybe in case a user enables resetting we could first move the content of concatenated_file.ms/SUBMSS/ back to msdir ? (Without overwriting existing files that have the same name?)

It's not going to be very clean, but I don't think we should keep the current reset behaviour.

gigjozsa commented 4 years ago

@paoloserra Can this be closed?

paoloserra commented 10 months ago

This is now a rare problem compared to early MeerKAT commissioning days. In the rare case that an observation is split into multiple MS files and only one has the BP/flux calibrator, I think the user will just have to concatenate the MS's before calibrating. Since this happens rarely (if at all) it is not worth aiming at developing a solution within CARACal