Open zonca opened 1 year ago
This is a great idea!
I have a tool handy that does that , can post a notebook and we can discuss about it.
@giuspugl yes, please, that would be very helpful, no hurry
@giuspugl @xzackli could you please share the code that you have to generate maps from a catalog of point sources? @keskitalo do you have a pointer to the documentation of the new TOAST operator that handles point sources from a catalog?
I would like to gather all the code we already have, then come out with a plan on what we would like to port into PySM.
@zonca thanks for refreshing this issue, can surely point you to my code that does that, i can surely help with porting it into pysm3. The code assumes that all sources are unresolved and projected as point-sources with a gaussian profile.
The Simons Observatory operator for simulating TOD from a catalog of sources is tersely documented here: https://sotodlib.readthedocs.io/en/latest/toast.html#reference. There is no anchor so you need to search for SimCatalog
. The sources can be static, variable or even transient but their position must be held fixed to the Celestial sphere.
@xzackli 's code to generate maps from a catalog is https://github.com/WebSky-CITA/XGPaint.jl/blob/main/src/radio.jl#L254-L329, this puts all the flux for each source into a HEALPix pixel.
I think we can use pixell
for this, see https://www.zonca.dev/posts/2024-05-02-pysm-point-source-pixell for a demo.
The low level functionality is implemented in C, it can simulate 10000 sources in half a second on Jupyter@NERSC.
next we need to decide the catalog format.
TOAST uses a TOML file which also includes frequency scaling for sources:
@giuspugl I see websky has catalogs as a function of frequency: https://portal.nersc.gov/project/sobs/users/Radio_WebSky/matched_catalogs_2/, do you also have or it is possible to produce a single catalog that assumes a model (or some fit) for frequency dependence?
currently the catalogs have the columns: ['flux', 'phi', 'polarized flux', 'theta']
I think PySM should include a single Websky catalog for Radio Galaxies, already with a pre-defined flux cut to reduce size and with parameters for PySM to estimate on the fly the flux at each frequency. Then we can use pixell to produce beam-smoothed maps.
Yes, it's possible to generate that kind of catalog for the bright sources. I think the easiest thing to do is actually to fit the existing sources with a log polynomial, since they have some moderately complicated spectral behavior, and we didn't save the original parameters which generated those spectra.
thanks @xzackli, that would be appropriate, would you be able to create that? For testing purposes even having a catalog with just 10 sources would be enough.
the plan sounds fine to me. i think that 3rd or 4th order log-polynomial should encompass most of the fluxes we've sampled with websky.
I think I can give it a try and you can review it, should I use the catalogs in https://portal.nersc.gov/project/sobs/users/Radio_WebSky/matched_catalogs_2/?
@msyriac as you have experience with other simulations, do you think "coordinates + coefficients of 4th order polynomial in log(freq) for flux" is ok as a standard representation of a catalog? Or do you have a better suggestion?
What exactly do you mean by standard here? Existing radio catalogs from e.g. the Sehgal+ and Agora simulations are likely not in that format, but could be post-processed to be.
@msyriac we need to make a decision on the format of this catalog. do you agree with "coordinates + coefficients of 4th order polynomial in log(freq) for flux" or do you have something better in mind? I see that both Sehgal and Agora just give fluxes at some reference frequencies.
@msyriac @giuspugl @xzackli please review https://app.reviewnb.com/galsci/pysm-notebooks/pull/1/ and add comments on that pull request.
For now I took the union of the 20 brightest sources at each frequency for a total of 43 sources, most have a simple shape and fit well, but some of them have a more complicated frequency dependence. See at the end of the notebook the plots in linear and semilog axes.
Do you have suggestions on how to improve the analysis? or would you like to see more sources?
Btw, I fell in love with xarray
.
@brandonshensley @seclark
@zonca thanks a lot for drafting this notebook! I think the fits you're performing are pretty much good overall.
When the SED are almost constant, i.e. there ain't much excursion range , the fit suffer but i would say it is working as well .
would be great to make an histogram of the fitted polynomial coefficients, i thin with xarray
it is something easy to do.
thanks @giuspugl, see https://gist.github.com/zonca/0138734c3d2aade31b0f836dfb441b3c it looks like the brightest source is a big outlier, so I've created histograms with and without it.
@giuspugl what do you think about the histograms?
@zonca thanks for doing this test! i don't see anything wrong with the histograms. logpolynomial coefficients seem pretty reasonable , they span few orders of magnitude depending on the SED.
I've run your notebook making some edit by normalizing the flux wrt the reference frequency employed for the websky catalogs (100 GHz) and it seemed to improve the range of fitted logpoly coefficients, accounting also for the outlier (see the dashed step histograms below).
@giuspugl can you please share your notebook?
also, we have 280 million sources, they are too many to simulate all of them. How do we want to implement this? A background pre-smoothed with a tiny beam (to avoid sharp boundaries) linearly interpolated and point sources simulations on top?
I recommend pre-saving a map containing dim sources (e.g. < 1 mJy or some other threshold) convolved to the smallest beam (e.g. 280 GHz). When loading a map, you can re-convolve (with almxfl) to the target beam, and you can add in >1 mJy sources with your cat2map operation up to the threshold requested. (Some pipelines might not want sources above some threshold, e.g. 10 mJy.. so this flexibility is very nice to have)
@msyriac so for the dim sources, should we keep linearly interpolating between frequencies? or can we use 1 single fiducial frequency dependence? Like some average of the sources?
I think it would be good to keep linearly interpolating between frequencies; a single fiducial frequency dependence would be too far from realistic.
@giuspugl can you please share your notebook?
https://gist.github.com/giuspugl/079cd9d644c3ba37b225e596adfd2428
@giuspugl the units of flux in the catalogs are Jy
, right?
ok, for testing purposes I created a catalog with a cut at .1 Jy at 100 Ghz, https://gist.github.com/zonca/8bca58b20c6312fc2f6384f5d6f6027e
it has 2.5k sources
@giuspugl the units of flux in the catalogs are Jy, right? yes
Your comment makes a good point, we should have added metadata to the .h5 files! Sorry, I haven't been receiving email from this thread but your plan sounds very good.
complete implementation of the catalog feature is merged in #187 next I want to run websky using the catalog and I'm sure I'll find more bugs.
We should implement a generic component that can take a list of coordinates and flux from an input file and generate a map already smoothed with the instrument beam. This can for example be used for radio galaxies.
Looking for an existing implementation that we can port into PySM.