askap-vast / vast-pipeline

This repository holds the code of the Radio Transient detection pipeline for the VAST project.
https://vast-survey.org/vast-pipeline/
MIT License
8 stars 3 forks source link

Add forced photometry #23

Closed dlakaplan closed 4 years ago

dlakaplan commented 4 years ago

When a source is detected in only some of the epochs, forced photometry (also called constrained fitting etc) should be used to determine robust constraints for each non-detection.

I have written a version of this code: https://github.com/dlakaplan/forced_phot.git

It assumes that ASKAPsoft images (including noise and background images) are available. Methods are present to do injection tests etc. It seems to work reasonable well and reasonable fast. For most sources it uses a definite linear algorithm:

Extract a sub-image for each source Create a beam kernel Determine the flux density by: F = sum (data * kernel/noise2) / sum(kernel2 / noise**2)

which doesn’t rely on any non-deterministic/non-linear fitting (so no convergence issues, etc). It seems to work fine for most sources, at least based on some testing with selavy catalogs and injected sources that I did.

However, if 2 or more sources are too close as determined using a KDTree algorithm, it will do a simultaneous fit with a non-linear fitter. This will typically only happen for a small fraction (<1%) of the total sources. I tested this as a function of separation for sources with similar flux densities, and determined a reasonable threshold of 1.5*BMAJ. Inside there cluster fitting gives good results but ignoring them fails. Note that this really need to consider all sources, not just those that are interesting.

I run this for all of the sources in a selavy catalog, and it takes ~10s for 5000 sources with some small fraction in clusters.

I didn’t do any explicit parallelization but I think it should be thread-safe.

I spent a little time updating the formatting (parameter documentation, blackening) but I would like to integrate it more with Sergio’s repository to use consistent standards for logging, error handling, etc. Any other standards or things to watch out for would be good too.

I’ve been working on testing. As I said, when I inject sources it performs very well. For comparison with the selavy catalog it does OK, but I believe that most of the errors are from non-Gaussian sources etc. For instance, the residuals correlate strongly with chi^2. If you have suggestions for other tests please let me know. If you can help with formal unit-testing, that would be good too.

github-actions[bot] commented 4 years ago

Congrats on your first issue and Welcome to Askap pipeline development: please assign the issue to a milestone if not auto-assigned

marxide commented 4 years ago

Based on a quick look at the codebase, it doesn't look like follow up of non-detected known sources has been implemented. Is that right, @srggrs @ajstewart?

If so, I suggest moving forward with the pipeline development and writing a placeholder for the forced photometry fitting, ideally as a function or whatever fits into the current design of the pipeline. Then we can work on replacing the placeholder with this code, adjusting David's function inputs/outputs if necessary to suit.

ajstewart commented 4 years ago

No it's not in yet, note there's also a small discussion warranted on the best and most useful way to implement this. I did start an issue #17 with that in mind.

srggrs commented 4 years ago

From Adam comment on #89:

A reminder of the logic:

  • The task is to fill in all the non-detection gaps in the light curve both past and present with forced fit extractions.

  • To do this we need to know which images, or rather sky regions, the source is in, so we can make sure we force extract at every possible time step. E.g. maybe there is a source that newly appears that is also overlapping in two or more sky regions, so we can extract the source from these images as well as its 'main line' sky region.

  • There might be a more efficient way to do it but in short it is:

    • Check what sky region each source belongs to.
    • Get images for said sky region(s).
    • Check which images have no extraction.
    • Run force extraction on required images (group by image).
    • Add extractions to measurements.
    • Recalculate the source statistics.

And just to keep in mind for the future, we may want to check before the forced extraction whether we should have actually seen the source in question in the past or future (maybe the images have a much higher RMS and are not sensitive enough). But we can get the above base level function going and then think about any further edits.

dlakaplan commented 4 years ago

I think that @marxide implemented a version of this locally for his use, so maybe check with how he did it.

On Apr 7, 2020, at 8:25 PM, Serg notifications@github.com wrote:

From Adam comment on #89:

A reminder of the logic:

• The task is to fill in all the non-detection gaps in the light curve both past and present with forced fit extractions.

• To do this we need to know which images, or rather sky regions, the source is in, so we can make sure we force extract at every possible time step. E.g. maybe there is a source that newly appears that is also overlapping in two or more sky regions, so we can extract the source from these images as well as its 'main line' sky region.

• There might be a more efficient way to do it but in short it is:

  • Check what sky region each source belongs to.
  • Get images for said sky region(s).
  • Check which images have no extraction.
  • Run force extraction on required images (group by image).
  • Add extractions to measurements.
  • Recalculate the source statistics.

And just to keep in mind for the future, we may want to check before the forced extraction whether we should have actually seen the source in question in the past or future (maybe the images have a much higher RMS and are not sensitive enough). But we can get the above base level function going and then think about any further edits.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

marxide commented 4 years ago

What I did basically follows what Adam outlined above, except I didn't have to worry about the sky regions as I run this process per-field and all the pilot fields match. If for some reason a source isn't covered by one of the epoch images, the forced photometry fit is NaN and doesn't end up in my final lightcurve.

ajstewart commented 4 years ago

Moving the conversation here from the PR:

David and I are using multi-order coverage (MOC) maps for the surveys plotting package. Maybe we could make use of these for sky regions? They are very small (tens of KB) and the mocpy package has a method to query if positions are covered by a MOC.

I wonder how fast this would be to query thousands of sources? The package might be useful down the line though for other queries.

@marxide do you know how the extraction code handles out of bounds, or NaN values at survey boundaries?

The fluxes returned will be NaN. Note that the background and rms images have a boundary of NaNs while the Stokes I images have a boundary of 0s, and the background and rms images are slightly smaller. For sources that are in the Stokes I image but in an area of NaNs in the background and rms images, the forced_phot flux will also be NaN.

I converted the Stokes I to NaNs I think? But ok that seems very easy to handle, just drop all NaNs.

dlakaplan commented 4 years ago

On Apr 8, 2020, at 6:06 PM, Adam Stewart notifications@github.com wrote:

Moving the conversation here from the PR:

David and I are using multi-order coverage (MOC) maps for the surveys plotting package. Maybe we could make use of these for sky regions? They are very small (tens of KB) and the mocpy package has a method to query if positions are covered by a MOC.

I wonder how fast this would be to query thousands of sources? The package might be useful down the line though for other queries.

@marxide do you know how the extraction code handles out of bounds, or NaN values at survey boundaries?

The fluxes returned will be NaN. Note that the background and rms images have a boundary of NaNs while the Stokes I images have a boundary of 0s, and the background and rms images are slightly smaller. For sources that are in the Stokes I image but in an area of NaNs in the background and rms images, the forced_phot flux will also be NaN.

I converted the Stokes I to NaNs I think? But ok that seems very easy to handle, just drop all NaNs.

It should be possible to convert the code to use nansum() and similar methods to ignore the NaNs. That will work for the singleton sources.

For the grouped islands I’d have to take a look. We could do a nan filter first, or convert the NaN to a float but set the variance to a high value (so as to ignore those points).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.