Optimal way to implement pybdsm_vis?

PeterKamphuis commented 5 years ago

In self_cal the pybdsm_vis mode does not combine both models when using cubical but merely uses the pybdsm model. I tried to fix this but cannot find the right syntax such that the correct values end up with cubical. That is 'MODEL_DATA+outputdir/lsm.html'. Due to how the parsing from stimela to cubical occurs, either an error is given or the lsm only is used without a directory.

PeterKamphuis commented 5 years ago

So it does actually work but only uses the clean components in the final iteration and for the rest only the pydsm model. Is this desirable? The same is actually true of meqtrees calibration.

KshitijT commented 5 years ago

So the idea behind the pybdsm_vis is: 1) Use pybdsm lsms for the n-1 iterations, where n is the total number of calibration iterations 2) Combine this model with the clean components from the lsm-subtracted-residuals for the final loop.

So the final loop would have lsm+clean components as model. If it just has clean components, it is not working correctly. @PeterKamphuis , could you post the cubical options from the log file ?

PeterKamphuis commented 5 years ago

@KshitijT I was actually thinking it wasn't working because I was missing the clean components in the first iteration. The problem with the cubcial version is however that indeed this is not parsed properly. I'll look up the options.

I wanted to add two bright sources in the first iteration that I couldn't get with pybdsm (In the end I got them with local_rms = False). So the pipeline actually does what you say it should but when we do not run the full iterations, either due to aimfast or other reasons, I find the name somewhat misleading?

Also, is it really a good idea to significantly change the model by adding clean components in the final iteration? I do understand we typically don't want the clean components in the first iteration.

KshitijT commented 5 years ago

@PeterKamphuis , the idea is to get all the non-extended sources with pybdsm first and subtract them out. Pybdsm doesn't work very well with diffuse emission, so we try to pick it up with clean components, this means that your total sky model would be more complete. Note that this is a good idea only if you have subtracted out all the lsm sources already - otherwise it's going to "double-count". Yeah, if the calibration is truncated earlier due to aimfast or other issues then it is not going to use the clean components and the name is a bit misleading - we can change it to pybdsm-optionally-vis if you like. :) On a more serious note, we could start using both clean components as soon as the other sources are subtracted out rather than just in the last iteration - still need to think how would we implement it.

PeterKamphuis commented 5 years ago

@KshitijT Is it made very clear somewhere that this mode only should be used when subtracting the model out. As for when to start including the clean components, maybe after the first self cal we could keep track of the flux in singular sources (already subtracted out) vs clusters or flux left in the image?

PeterKamphuis commented 5 years ago

@KshitijT These are the cubical options in the final iteration

gocubical --data-ms /msdir/GMRT_Original-corr.ms --data-column DATA --data-time-chunk 128 --model-list /output/mkathi-pipeline_1-pybdsm.lsm.html:MODEL_DATA --montblanc-dtype float --weight-column WEIGHT --madmax-enable 1 --madmax-estimate corr --madmax-plot 0 --madmax-threshold 0,10 --sol-jones "G" --sol-term-iters 50 --bbc-save-to file --dist-ncpu 4 --out-name output/GMRT_Original-1_cubical --out-mode sc --out-casa-gaintables 0 --out-plots 1 --g-time-int 50 --g-freq-int 0 --g-clip-low 0.5 --g-clip-high 1.5 --g-solvable 1 --g-type phase-diag --g-save-to /output/g-gains-1-GMRT_Original-corr.parmdb

Which leads to:

10:17:51 - main [0.1/0.1 0.9/0.9 0.4Gb] - list .............................................. = output/mkathi-pipeline_1-pybdsm.lsm.html:MODEL_DATA

I thought the colon had to be a + sign but this line:

10:17:55 - data_handler [0.1/0.1 0.9/0.9 0.4Gb] direction 0: /home/peter/output/mkathi-pipeline_1-pybdsm.lsm.html + MODEL_DATA

Indicates it is ok. My apologies. Should we close this or do we want to discuss when to include the clean components fuurther?

KshitijT commented 5 years ago

@PeterKamphuis , no, I should make it clear somewhere. And we can keep track of lsm and clean components, it's just tricky (this essentially means either i) we don't run pybdsm after the first run or ii) we do run pybdsm, but then we have to take out the corresponding clean components and re-predict). So I would characterise the three modes like this: 1) Pybdsm_only : If you have only compact sources in the field 2) Vis_only: If you have significant fluffy emission in the field 3) Pybdsm_vis: If you have lots of compact sources but also diffuse emission in the field

PeterKamphuis commented 5 years ago

@KshitijT The first run you mean after the first self_calibration or on the cross_calibrated image. I typically find that the images from the cross_calibration are not too great and there is a large danger of picking up artifacts in the model. Of course this also depends on the telescope/frequency but from the GMRT data I would suggest to not subtract the initial model from the cross_calibration. After that we could subtract the single sources lsm sources out from the data and remove all clusters from the lsm list?

I think the classification makes sense but that would also mean that it is worth to put some effort into thinking out a good scheme for pybdsm_vis.

KshitijT commented 5 years ago

@PeterKamphuis , by first run I mean after the first phase cal round - I don't see subtracting sources before that, unless you have a nice model at hand, so I think we agree on that one. The subtraction of the lsm sources from the data would be a product of the calibration process anyway (if you stick to CORR_RES). One way to go after that would be to make use of a mask for extended emission (that would come in through clean components) and use pybdsm on the rest of the image only. But let's think a more about the final scheme for pybdsm_vis before we implement this.

caracal-pipeline / caracal

Optimal way to implement pybdsm_vis? #363