e-merlin / eMERLIN_CASA_pipeline

This is CASA eMERLIN pipeline to calibrate data from the e-MERLIN array. Please fork the repository before making any changes and read the Coding Practices page in the wiki. Please add issues with the pipeline in the issues tab.
GNU General Public License v3.0
14 stars 11 forks source link

Speed up calibration for unaveraged data #99

Open jradcliffe5 opened 6 years ago

jradcliffe5 commented 6 years ago

Hi @jmoldon,

I had a thought on a possible way to speed up calibration for unaveraged data. I was wondering whether you could derive calibration tables on averaged data (thus should be quicker to derive solutions) and then apply these solutions at the end to the unaveraged data? Ofc the process would cause extra disk space to be used as you'd have to have both unaveraged & averaged data and we would have to think about the steps (e.g. flagging) having to be run on both data sets.

I'm sure @amsrichards will know also!

jmoldon commented 6 years ago

Hi, these are my thoughts. It would probably speed up things, I think the VLA pipeline does that. Other benefits I see are:

On the other hand, the problems I see are:

So no fundamental problems apart from building a robust logic to the process. I will think more about it, but I don't think I have time to implement a big change like this for the moment.

By the way, if you have some eMCP.log files, I would like to see them to identify bottlenecks compared to runs on other machines.

amsrichards commented 6 years ago

Hi,.. no, I don't really know! I think that the first step would be an analysis of where our slowest steps are. The fact that we have fewer antennas but (usually) finer time sampling may make it different from the VLA (or ALMA) case. In particular, if working with unaveraged data:

I have often used averaged ALMA continuum data to derive gain tables to apply to un-averaged spectral line data using two copies of the same data set and that works fine. However this is only with different amounts of spectral averaging and I am careful to make sure that all the averaged channels for each spw are the same (e.g. 512 input > 64 output). CASA tends to do odd things with orphan channels. I have also noticed that when time-averaging in CASA you often get a mix of integration times (since input scans are often of variable length), mstransform is quite intelligent in not leaving orphan very short integrations, e.g. if you ask for 12-s intervals in a 27-s scan you end up with 3x9-s. However, if the next scan is 36-s then you get 3x12-s. This is handled fine by the weights but might cause problems in the case that Javier mentioned i.e. the second data set being differently flagged (also applying to channels I guess). In principle fixable by statwt but maybe you end up with too many fixes taking up all the saved time....

Probably worth talkiing to VLA/ALMA pipeline people (I am sure you have already)...