Testing new implementation

cyriltasse commented 1 year ago

Hi-

Ok - so I've been implementing various changed to DynSpecMS. Branch: MTilde_ASync

First the bug of the off being disrtibuted it the top left quadrant is fixed
the off are distributed in all facets with the same number per facet (and a minimum number of --nMinOffPerFacet (default 5))
the Jones are not anymore applied at the inverse but as the Hermitian transpose as described in the DDFacet paper (this is the proper way to do). Just dealing with scalar case so far, but I'll have to extend it to full pol if we redo the calibration with the constrained solvers since we use diagonal solve in one step.
DynSpecMS is now asyncronous, meaning it prepares the next datachunk as it processes the currect one. Make it more efficient by about 25%

The two first ones will highly improve the statistics robustness, and potential to detect and reject stuff. The last step should improve the results to get better spectra. In the few tests I did I see some improvement but nothing dramatic.

This is what the off/on (blue/green) now looks like. Only facets covered by the search radius and concerned with offs beeing scattered accross them.

This is the comparison between before (left) and now (right):

Noise is lower by 2% in the newer version with lower artifacts (in a few places it's higher, I'll experiment quickly with clipping the data with the high gains, but I think it's not too bad given we now will have good off statistics).

Joe - time to add you proper motion correction? Give me a module and I can update the code

The command line I've used was:

ms2dynspec.py --ms big-mslist.txt --data DATA --model DDF_PREDICT --sols [DDS3_full_smoothed,DDS3_full_slow_merged] --rad 2. --SolsDir SOLSDIR --BeamModel LOFAR --BeamNBand 1 --DicoFacet image_full_ampphase_di_m.NS3.DicoFacet --noff 100

The facet layout is given by --DicoFacet argument

cyriltasse commented 1 year ago

We need to do tests on one of the nodes the data would be reduced on, but if DynSpecMS turns out to be too slow, I can probably accelerate it...

PhilippeZarka commented 1 year ago

Impressive. If I can help with the input catalog or anything...

jrcallingham commented 1 year ago

Nice Cyril. What is the radius of the large green circle? 2.2 deg? We might want to make it a little larger since some stars are detected ~3 deg from the pointing centre but will not have their dynamic spectrum produced.

jrcallingham commented 1 year ago

As I said in email, I will get you a modified target list and proper motion code tonight/tomorrow

cyriltasse commented 1 year ago

Yeah - 2 degrees that's what we have previously use. We could go to 3 but computing might be x2-3. Let's see what we can afford after we have tested and quantified computing time?

cyriltasse commented 1 year ago

Hey,

On the left panel is the right hand image of the above and when you apply to Jones clipping, you get the right panel image bellow

It looks pretty good, it could really improve things...

PhilippeZarka commented 1 year ago

Promising...

jrcallingham commented 1 year ago

This is looking fantastic Cyril! This kind of "scruff" really limits some analysis and was producing false positives. However, I still dont understand what produces the "scruff" at the top right then? If the Jones clipping got rid of the stuff in the middle, why not that? Maybe it is lower SNR?

cyriltasse commented 1 year ago

However, I still dont understand what produces the "scruff" at the top right then? If the Jones clipping got rid of the stuff in the middle, why not that? Maybe it is lower SNR?

Good question, I don't really know. But clipping don't mean you're left with good solutions. Just less extreme. And there is also the visibilities themselves that can go crazy (we have applied the DI Jones as the inverse (as the whole rest of the world) - and not as the Hermitian transpose, so low amplitudes blow up the noise). And there is the facts that some time-freq has less (flagged) visibilities stacked - therefore more noisy average. In my process, this is taken into account, I reduce the spectra taking the weights into account.

It will be important to test on other fields before we decide whether we want to use Jones clipping (but I think we will want to). I want to test on fields that has very bad facets for instance.

cyriltasse commented 1 year ago

I'm now editing this reprocessing script as suggested by Tim. I hope we can run a first test with a few tens of fields by next week. I'd like to launch the reprocessing before christmass.

jrcallingham commented 1 year ago

Hey mate,

Working on it. Probably will not get it to you today but will definitely get it to you Monday afternoon.

cyriltasse commented 1 year ago

@jrcallingham would be cool - then the first batch of tests can be run using the pm correction.

jrcallingham commented 1 year ago

Hey Cyril - as I said in the general email to the group, here is the proper motion code. I dont want to implement it directly as you know you code best but I have assumed you have already read in the table at some point (see comments etc) ``


# Proper motion correction

from astropy.io import fits
from astropy.coordinates import SkyCoord, Distance
from astropy.time import Time
import astropy.units as u

# Assuming two things, you have read in the table and made the vairables ra, dec, pmra etc, and you the date of the observation (date_obs) in YYYY-MM-DD format.

# In terms of a pipeline I would have an if statement here that if pmra is a real value, then do the following, 
# otherwise just extract at the reported ra,dec position

c = SkyCoord(ra=ra * u.deg,
            dec=dec * u.deg,
            distance=Distance(parallax=parallax * u.mas,allow_negative=True),
            pm_ra_cosdec=pmra * u.mas/u.yr,
            pm_dec=pmdec * u.mas/u.yr,
            obstime=Time(ref_epoch, format='decimalyear'))

c_lotss = SkyCoord(ra=ra_lotss * u.deg,
            dec=dec_lotss * u.deg,
            obstime=Time(date_obs, format='iso'))

epoch_lotss = Time(date_obs, format='iso')

print('Performing proper motion corrections for '+field)

c_gaia_to_lotss_epoch = c.apply_space_motion(epoch_lotss)

# Then to extract the correct ra and dec just do it at  c_gaia_to_lotss_epoch.ra and c_gaia_to_lotss_epoch.dec```

jrcallingham commented 1 year ago

Note that ra_lotss and dec_lotss are just the centre of the field. You can probably not set them too

cyriltasse commented 1 year ago

@jrcallingham Cool - thanks!

@mhardcastle @tim

I see some weird things happening with DDF/kMS reprocessing the data... I'm working with P236+53/L470106

Looks like the DicoModel was made with "--Beam-At facet", but when I do that the resid look ugly, si I tried using "--Beam-At tessel" and now they look good. Also the option is not present in the kMS parset at the time (was tessel by default)... DDFacet log says version "dev-c563ecb". I tried to git checkout to investigate but it doesn't match with a tag/hash.

Do you have any clues what's going on? How I could dig in the issue?

twshimwell commented 1 year ago

perhaps remove the -dev bit. I see from 11th June 2019 - https://github.com/cyriltasse/DDFacet/commit/c563ecb0eafbf16d18b74057f8c040a43bc75136

cyriltasse commented 1 year ago

Yeah, perfect that works! Great inspiration !!! thanks!

cyriltasse commented 1 year ago

Ok I think I found the issue. This is for the version that was used in the pointing I'm doing my tests on. The --Beam-At facet actually triggers the same as tessel: https://github.com/cyriltasse/DDFacet/blob/c563ecb0eafbf16d18b74057f8c040a43bc75136/DDFacet/Data/ClassJones.py#L190

I was fine because the beam was computed at the tessel by default in the kMS used at the time

But since the bug has been fixed since this we now have to specify tessel.

Now the problem is that we should be able to know what DicoModel has been processed using what beam mesh in both kMS/DDF.

@mhardcastle @twshimwell I there a simple way to get the version of DDF/kMS that have been used for the processing? With a little chance there's just a few of them and it would be easy to figure out which beam mesh we should use to compute the residuals?

cyriltasse commented 1 year ago

I could automatically scan the logs and build a database of software versions... Maybe something already exists?

mhardcastle commented 1 year ago

For each archived dataset there should be a summary.txt file in misc.tar that has the version numbers of all the software. But it's not in the database, unfortunately...

cyriltasse commented 1 year ago

Fantastic... I'll batch read this!

cyriltasse commented 1 year ago

Ok - I'm downloading all the misc.tar and extracting summaries... It might take a few days before I get a representative sample and an understanding of the various DDF/kMS versions that have been used

cyriltasse commented 1 year ago

A few days because from my rough calculation the getting the misc.tar takes a while, and to download all of them will take 6 days. I understood they are stored on tape, so the copy is pretty slow...

cyriltasse commented 1 year ago

Hi this is a first at the various version of ddf-pipeline, DDFacet, kMS, and DynSpecMS that have been used:

As far as I've analysed only the version of DDFacet 0.6.0 if computing the beam with the actual --Beam-At facet option. The others versions either don't have the option or have the option but is bugged (and actually do --Beam-At tessel).

mhardcastle commented 1 year ago

This is very cool to see. The solid lines in the last ~ 1 yr correspond to me using a static singularity image for processing, post-DR2. All DR2 data will have been processed with older DDF versions.

cyriltasse commented 1 year ago

4 years of processing... :)

cyriltasse commented 1 year ago

@twshimwell @mhardcastle this probably means that the reprocessing script that has been used with the more recent version of DDFacet (version>0.6.0) on DicoModel that have been produced with older version (version<0.6.0) will give wrong results. This is a comparison of how wrong the results are in the image plane (left is with --Beam-At facet vs --Beam-At tessel on the right)

and in the dyns at the location of the blue circle

So quite critical to actually take this into account...

mhardcastle commented 1 year ago

This probably mostly affects the source subtraction that's done in the reprocessing scripts. But also will affect us potentially if we try to re-image non-DR2 data... we can probably pretty easily build in a check for the original DDFacet version.

cyriltasse commented 1 year ago

we can probably pretty easily build in a check for the original DDFacet version.

yes - for the DynSpecMS reprocessing I'm editing reprocessing_utils.py and I will add a version check there and modify the call to DDF.py ... --Beam-At accordingly...

cyriltasse commented 1 year ago

@twshimwell ok it seems I have a working version, probably you can have a go on just one or two pointings?

My current branches are: DynSpecMS: MTilde_ASync_weights ddf-pipeline: reprocDynSpec

@mhardcastle @twshimwell it's weird the run_full_field_reprocessing_pipeline that I've modified returns me RuntimeError('Requested field is not in database') in like 50% of the fields here, i see it's testing based on this

Are you sure that checking at full_field_reprocessing is correct?

I'll try to accelerate a bit DynSpecMS, and include @jrcallingham code for pm correction

Hey Cyril - as I said in the general email to the group, here is the proper motion code. I dont want to implement it directly as you know you code best but I have assumed you have already read in the table at some point (see comments etc)

@jrcallingham I was reading your code, and I can see also in your comment I need to read a table to get the pmra pmdec. Not sure what you mean could you include and test that in your code?

cyriltasse commented 1 year ago

The current code is downloading, doing the predict with the right --Beam-At depending on DDF version number of the DicoModel, using Jones clipping, saving the facet/tessel information in the fits header, doing 3 random per facet

twshimwell commented 1 year ago

Database now exists for all fields.

We do need to create a new column on the database though. @mhardcastle is it possible to add a column called "DynSpecMS"? to the full_field_reprocessing database?

twshimwell commented 1 year ago

Ah sorry I could do that. So now the full_field_reprocessing has a DynSpecMS column that we can populate.

cyriltasse commented 1 year ago

Ok I've got a working singularity image ! Was quite an adventure... I'm now in touch with @mhardcastle and @twshimwell to start running, will let you know

jrcallingham commented 1 year ago

@jrcallingham I was reading your code, and I can see also in your comment I need to read a table to get the pmra pmdec. Not sure what you mean could you include and test that in your code?

Sorry Cyril. Just saw this. I just mean you have to read in the DynSpec catalogue table I sent. I don’t know how you are doing that in the pipeline, but you can read it into variable names via astropy.io

cyriltasse commented 1 year ago

The table you sent on monday 16:12 CET? There has been discussions with Philippe later, is everyone happy with this target list?

jrcallingham commented 1 year ago

It’s the updated table I sent on Wednesday which had some additions from Philippe (can find here https://www.dropbox.com/s/vgv30hjd1zzi65c/dyn_spec_catalogue_addedexo.fits?dl=0)

cyriltasse commented 1 year ago

@jrcallingham thanks I'm reading the fits table and I see 30 NaNs in ra/dec and 23000 in pmra/pmdec. Is that all normal/understood?

jrcallingham commented 1 year ago

Nope. That does not sound normal. I will investigate in an hour

cyriltasse commented 1 year ago

Also in the catalog I only see a DESIGNATION with the name of the object - but the former catalog also had a Type key (exoplanet, gaia, brown dwarf, etc) tracing the source / selection criteria - could you add this if you have the time?

jrcallingham commented 1 year ago

Might be hard to add now. Still in meetings. Will see what I can do.

jrcallingham commented 1 year ago

Hi Cyril,

The ~30 NANs in ra/dec where due to a bad/rushed concatenation of V-LoTSS. Fixing that now and adding a Type column. However, I do not see the 23000 in pmra/pmdec? I see pmra/pmdec of ~3000-10000 mas/yr (keep in mind this is milliarcsec).

jrcallingham commented 1 year ago

Hi Cyril,

The most up-to-date version of the catalogue can be downloaded here: https://www.dropbox.com/s/y9yx9rnqs44yqiv/dyn_spec_catalogue_addedexo_addvlotss.fits?dl=0. Main difference with previous catalogue is that I fixed the NAN ra/dec for the 30 odd sources and added a type column. Also, this time I only removed duplicates if they were in a stellar catalogue (e.g. Gaia and Active stars). If the source is in exoplanet and Gaia I kept both entries (to be safe and clear where the extraction is occuring).

cyriltasse commented 1 year ago

Ok cool, thanks

However, I do not see the 23000 in pmra/pmdec? I see pmra/pmdec of ~3000-10000 mas/yr (keep in mind this is milliarcsec).

I was saying that the field pmra / pmdec has many NaN:

In [8]: np.count_nonzero(np.isnan(c["pmra"]))
Out[8]: 23563

cyriltasse commented 1 year ago

(the new catalog still has NaN for these fields)

jrcallingham commented 1 year ago

Yes - that is fine. Not everything has a measured proper motion (e.g. ultracooldwarfs that are too faint in Gaia or Active stars that do not have it reported. I have tried to supplement with Gaia where possible but was not possible for all).

cyriltasse commented 1 year ago

Ok cool thanks!

We've just talked with @twshimwell and @mhardcastle about the details of how we do things in practice on the herts cluster. We'll let you know when we have something running...

cyriltasse commented 1 year ago

Hey Cyril - as I said in the general email to the group, here is the proper motion code. I dont want to implement it directly as you know you code best but I have assumed you have already read in the table at some point (see comments etc) ``

# Proper motion correction

from astropy.io import fits
from astropy.coordinates import SkyCoord, Distance
from astropy.time import Time
import astropy.units as u

# Assuming two things, you have read in the table and made the vairables ra, dec, pmra etc, and you the date of the observation (date_obs) in YYYY-MM-DD format.

# In terms of a pipeline I would have an if statement here that if pmra is a real value, then do the following, 
# otherwise just extract at the reported ra,dec position

c = SkyCoord(ra=ra * u.deg,
            dec=dec * u.deg,
            distance=Distance(parallax=parallax * u.mas,allow_negative=True),
            pm_ra_cosdec=pmra * u.mas/u.yr,
            pm_dec=pmdec * u.mas/u.yr,
            obstime=Time(ref_epoch, format='decimalyear'))

c_lotss = SkyCoord(ra=ra_lotss * u.deg,
            dec=dec_lotss * u.deg,
            obstime=Time(date_obs, format='iso'))

epoch_lotss = Time(date_obs, format='iso')

print('Performing proper motion corrections for '+field)

c_gaia_to_lotss_epoch = c.apply_space_motion(epoch_lotss)

# Then to extract the correct ra and dec just do it at  c_gaia_to_lotss_epoch.ra and c_gaia_to_lotss_epoch.dec```

@jrcallingham in your code ref_epoch and date_obs are the same thing?

cyriltasse commented 1 year ago

I'm stupid - sorry, I didn't see this was in the table...

cyriltasse commented 1 year ago

For info, these are the pm correction for the objects in the catalog (log scale)

Some targets really move! :)

cyriltasse / DynSpecMS

Testing new implementation #19