cositools / cosipy

The COSI high-level data analysis tools
Apache License 2.0
3 stars 16 forks source link

Fit TS map using galactic PsiChi #113

Closed Yong2Sheng closed 5 months ago

Yong2Sheng commented 7 months ago

Fit using galactic PsiChi

I updated the module fast_ts_fit.py to support the TS map fit using galactic Compton Data Space (CDS). The fit doesn't use the scatt map method, it uses the galactic response directly, thus avoiding the response conversion.

The fit in galactic CDS doesn't require orientation, so the coordinate conversion is also avoided, making the TS map fit in galactic faster than in local CDS.

The tutorial notebook is at cosipy/docs/tutorials/Parallel_TS_map_computation.ipynb. The tutorial notebook fits the GRB TS map using local CDS and fits Crab TS map using galactic CDS.

This PR also solves the issue in #108:

  1. The GRB orientation is read from an orientation file instead of hard-coded.
  2. Get rid of the warnings from both python and 3ML
  3. Add the option to use a specific number of cores to run parallel fit
  4. Plot "x" instead of an arrow to indicate the true location of GRB and Crab

Some other minor changes:

  1. Parallel fitting uses fork as the default start methods, so it will be consistent across different OS. (difference OS has different default start method for muiltiprocessing)

Special notes: To do list: add a plot method to plot the regions at different confidence levels. To fix: the ts values for Crab TS map looks way to large. I need to do some test to investigate it.

DC2

  1. The DC2 notebook is separated from the main testing notebook:
    • Parallel_TS_map_computation_DC2.ipynb is the one for CD2.
    • Parallel_TS_map_computation_test.ipynb is the main testing notebook, which should not be included in DC2.
  2. The documentation of the notebook is included in the DC2 notebook.
  3. Added a dimmer GRB to demonstrate how the TS values change as signal strength varies.
israelmcmc commented 7 months ago

Very nice @Yong2Sheng, it's working!!

I think the ring is likely due to some inaccuracies due to the pixelation while rotating the response. Some of these might go away with a finer resolution. In any case, compared to the maximum TS, I think it should be a small effect and it won't really affect the localization. The source is likely localize to within 1 pixel in any case (a Delta TS of 4.61 corresponds to a 90% containment confidence region). You could try to localize dimmer sources near threshold, and see how much this matters.

ckarwin commented 5 months ago

Hi @Yong2Sheng I'm going through your PR. I haven't directly tested the code yet, but I already have a number of comments that I think need to be addressed:

Comment 1: Please make a directory for the TS example notebook. Everything should be in docs/tutorials/ts_map.

Comment 2: I see you’ve added multiple files to cosipy/test_data, however, these files should not go there. The test_data directory should be kept lightweight, and it should only be for files that are used for unit testing. You can put the files needed for your example in the ts_map directory that you make (as mentioned above). If the files are too large for GitHub, please but them on wasabi. Let me know if you need help uploading the files to wasabi (and where to put them exactly).

Comment 3: I am confused about the difference between Parallel_TS_map_computation_DC2.ipynb and Parallel_TS_map_computation_test.ipynb. These two notebooks look pretty similar, and it’s not clear what the difference is. I think it would be better to just have the DC2 notebook for now. If the test notebook is just for development, then it shouldn’t be included in the main branch. For the rest of my comments, I will just focus on the DC2 notebook.

Comment 4: Your notebook starts with a long explanation of the CDS, measured data, background model, and response. However, this information is explained elsewhere, and is not needed here. The main goal of your current notebook is to explain how the TS calculation works. You should therefore try to keep it as simple as possible and only explain information that is directly pertinent to the TS calculations. Otherwise, people might get lost in the other details.

Comment 5:

Unlike the IR and optical regime, the detector receives gamma-ray photons one by one because of the low flux. This counting experiment makes Possion distribution the preferred statistical model to describe the gamma-ray photons.

I am not really sure what you’re talking about here. I would remove this section entirely.

Comment 6: You should specify which data files are being used in the example, and where people can find them. They should all be available on wasabi, or directly in your example directory.

Comment 7: At the very top of the notebook you discuss improvements in progress, but this should be at the end of the notebook, after you already discuss what’s currently available.

Comment 8: I took a quick look at your doc string in some of your code and they are not in proper format, and many are incomplete. For example, there are multiple methods with optional parameters that are not specified in the Parameters section. Please update this.

More to come after I go through the code.

Please let me know if something is not clear.

ckarwin commented 5 months ago

Hi @Yong2Sheng, I was able to successfully run your notebook. I only tested the first example. It took 54 minutes, using just 1 core (so no parallel computing). The fit finished ok, and I was able to reproduce your TS plots. As you already mentioned, the calculation in local coordinates can be sped up by using the precomputed point source response, instead of rotating the detector response on the fly. We've already discussed this, and I guess your plan is to do this in the future (after DC2), which will be great.

One thing to emphasize: I didn't check the details of the statistical calculation. Has anybody double checked this or validated the algorithm? If not, I think the algorithm should be validated at some point. One way to do this is to calculate the TS map for a source of known TS. But I think we can worry about that after merging your PR.

I have some additional comments, below.

Comment 9: For the GRB fit in local coordinates: I believe this is with the Compton Sphere. It is not clear what the simulated data is. Is this the GRB and BG that was simulated for mini-DC2? Please specify what you are using for the source and the background. The data can likely go here: https://s3.us-west-1.wasabisys.com/cosi-pipeline-public/ComptonSphere/mini-DC2/GalacticScan.inc1.id1.crab2hr.extracted.tra.gz. But we need to make sure it doesn’t conflict with any of the files that are already there. Please let me know.

Comment 10: For the Crab fit in Galactic coordinates:

“I've uploaded the binned data to wasabi at “COSI-SMEX/DC2/Data/Backgrounds”

What did you upload to wasabi? I don’t see the binned files.

Comment 11: The TS fit takes a long time. Is it possible to add a progress bar?

Comment 12: Has somebody checked that the statistics are all correct? The algorithm should be validated at some point. Maybe inject a source with a known significance and make sure that you can recover the TS. I can add this to the DC2 extra challenges for now.

Comment 13: It would be helpful to print the number of cores being used.

Comment 14: Please add '%matplotlib inline' before first plot.

Yong2Sheng commented 5 months ago

Hi @ckarwin, thank you for the detailed comments! Here are the replies and fixes to your comments. Please let me know if you have any questions or new suggestions.

Fixes and replies

Comment 1

Please make a directory for the TS example notebook. Everything should be in docs/tutorials/ts_map.

I created docs/tutorials/ts_map folder and moved the DC2 notebook and some small data files into this folder.

Comment 2

You can put the files needed for your example in the ts_map directory that you make (as mentioned above). If the files are too large for GitHub, please but them on wasabi. Let me know if you need help uploading the files to wasabi (and where to put them exactly).

I moved the data from test_data folder to docs/tutorials/ts_map. Larger files are uploaded to Wasabi.

Comment3

If the test notebook is just for development, then it shouldn’t be included in the main branch

The testing notebook is removed. It's kept only in my local repository.

Comment 4

You should therefore try to keep it as simple as possible and only explain information that is directly pertinent to the TS calculations.

I removed the unnecessary texts and only kept the portion directly relating to the TS calculation.

Comment 5

I am not really sure what you’re talking about here. I would remove this section entirely.

Removed.

Comment 6

You should specify which data files are being used in the example, and where people can find them. They should all be available on wasabi, or directly in your example directory.

The data are explained in the text now, including the availability (available from the same directory as the notebook or from Wasabi).

Comment 7

At the very top of the notebook you discuss improvements in progress, but this should be at the end of the notebook, after you already discuss what’s currently available.

It's moved to the end of the notebook.

Comment 8

I took a quick look at your doc string in some of your code and they are not in proper format, and many are incomplete. For example, there are multiple methods with optional parameters that are not specified in the Parameters section. Please update this.

I updated the doc strings following this guide. Could you please take a look again?

Comment 9

For the GRB fit in local coordinates: I believe this is with the Compton Sphere. It is not clear what the simulated data is. Is this the GRB and BG that was simulated for mini-DC2? Please specify what you are using for the source and the background.

Yes, the GRB was simulated with the Compton Sphere massmodel. I've included the binned GRB data in docs/tutorials/ts_map since they are small in size to be uploaded to GitHub. The explanation and availability of the data files are also added to the notebook documentation.

Comment 10

For the Crab fit in Galactic coordinates:

“I've uploaded the binned data to wasabi at “COSI-SMEX/DC2/Data/Backgrounds”

What did you upload to wasabi? I don’t see the binned files.

I realized that I uploaded the files to Wasabi in a wrong way (the file names are not correct). Now I reuploaded:

  1. Continuum_Flat_100to10000keV_10logEbins_HealPix03.binnedimaging.imagingresponse_nside8.area.h5.gz to cosi-pipeline-public/COSI-SMEX/DC2/Responses/PointSourceReponse
  2. Albedo_galactic_CDS_binned.hdf5 to cosi-pipeline-public/COSI-SMEX/DC2/Data/Backgrounds
  3. Crab_galactic_CDS_binned.hdf5 to cosi-pipeline-public/COSI-SMEX/DC2/Data/Sources

Other responses and unbinned data are already on Wasabi

Comment 11

The TS fit takes a long time. Is it possible to add a progress bar?

I think it's possible after searching the web. I am working on this, but more tests are needed before officially implementing this feature.

Comment 12

Has somebody checked that the statistics are all correct? The algorithm should be validated at some point. Maybe inject a source with a known significance and make sure that you can recover the TS. I can add this to the DC2 extra challenges for now.

I used bctools/fitting/fast_norm_fit.py from @israelmcmc to calculate the TS statistics. I agree that we should do a test on a source with a known significance.

Comment 13:

It would be helpful to print the number of cores being used.

Now it will print the available number of cpu cores and the actual number of cpu cores that will be used for calculation.

Comment 14:

Please add '%matplotlib inline' before first plot.

Added.

ckarwin commented 5 months ago

Hi @Yong2Sheng, Thank you for implementing my comments. Below please find some additional replies/comments.

comment 15: Please see @israelmcmc's comment in PR 126 regarding the use of # and ## in the markdown cells of your notebook. Specifically, your notebook should only use # once and only use ## for subtitles that should be shown in the table of contents.

comment 16:

I realized that I uploaded the files to Wasabi in a wrong way (the file names are not correct). Now I reuploaded:

Continuum_Flat_100to10000keV_10logEbins_HealPix03.binnedimaging.imagingresponse_nside8.area.h5.gz to cosi-pipeline-public/COSI-SMEX/DC2/Responses/PointSourceReponse Albedo_galactic_CDS_binned.hdf5 to cosi-pipeline-public/COSI-SMEX/DC2/Data/Backgrounds Crab_galactic_CDS_binned.hdf5 to cosi-pipeline-public/COSI-SMEX/DC2/Data/Sources

Please do not upload files to wasabi unless you are certain where they should go. You uploaded the response file to a COSI-SMEX folder, but it is for the Compton sphere. Additionally, it's a detector response, and not a point source response file. Please remove the file from the current location and upload it here: cosi-pipeline-public/ComptonSphere/mini-DC2.

Also note that the above directory already has the following response file: FlatContinuumIsotropic.LowRes.binnedimaging.imagingresponse.area.nside8.cosipy.h5.zip What is the difference with the response that you're using?

Please move Albedo_galactic_CDS_binned.hdf5 and Crab_galactic_CDS_binned.hdf5 to cosi-pipeline-public/COSI-SMEX/cosipy_tutorials/ts_maps. Also remove them from the current location.

comment 17: You moved the discussion on future improvements to the end, but you didn't remove it from the intro.

comment 18: I think the section headings for each example need to be indicated more clearly. Example 1: Fit Crab using the Compton Data Space (CDS) in galactic coordinates Example 2: Fit a fainter GRB using the Compton Data Space (CDS) in local coordinates (Spacecraft frame) Example 3: Fit Crab using the Compton Data Space (CDS) in galactic coordinates

You should also make it clear in example 3 that you are now using the SMEX mass model (compared to the Compton Sphere mass model that was used for the first two examples).

comment 19: Your "Data availability" sections in Examples 1 and 2 are redundant. It's the same data, and so it only needs to be specified once.

comment 20: You define both bkg_model and bkg_original. Why do you have two background models, and what is the difference? I think there should be just one background, which should be the same that is used in the simulated data.

comment 21:

available on Wasabi

This is said in multiple places, but it's better to be more specific. Please give the actual path where people can find the file. e.g. wasabi path: cosi-pipeline-public/COSI-SMEX/DC2/Data/Sources/Crab_DC2_3months_unbinned_data.fits.gz

comment 22: Please remove the last empty cell at the bottom of the notebook.

comment 23: Your doc strings look better. There are still some things to refine, but this can be done later on. For example, the formatting for your optional arguments is not quite correct. From the documentation:

If it is not necessary to specify a keyword argument, use optional: x : int, optional Optional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:

Description of parameter `x` (the default is -1, which implies summation
over all axes).
Yong2Sheng commented 5 months ago

Hi @ckarwin, thanks for the comments. Here are the revisions.

Revisions

Comment 15

your notebook should only use # once and only use ## for subtitles that should be shown in the table of contents.

# is used only once now.

Comment 16

What is the difference with the response that you're using?

FlatContinuumIsotropic.LowRes.binnedimaging.imagingresponse.area.nside8.cosipy.h5.zip is the response simulated by MEGAlib using the FISBEL binning. Thus, we need to convert the FISBEL to the Healpix binning in order to use it, which involves interpreting the data that might introduce systematic errors.

Continuum_Flat_100to10000keV_10logEbins_HealPix03.binnedimaging.imagingresponse_nside8.area.h5.gz is also the response simulated by MEGAlib, but it used Healpix binning intrinsically. Therefore, binning conversion is no longer needed. Both are part of the Unit Test and uses the Compton Sphere massmodel. I moved it to the mini-DC folder.

Please move Albedo_galactic_CDS_binned.hdf5 and Crab_galactic_CDS_binned.hdf5 to cosi-pipeline-public/COSI-SMEX/cosipy_tutorials/ts_maps. Also remove them from the current location.

Done.

Comment 17

You moved the discussion on future improvements to the end, but you didn't remove it from the intro.

Removed from the intro.

Comment 18

I think the section headings for each example need to be indicated more clearly.

Done.

You should also make it clear in example 3 that you are now using the SMEX mass model (compared to the Compton Sphere mass model that was used for the first two examples).

I explained both GRB and Crab data by mentioning what massmodel are used and what response should be used to analyze the data.

Comment 19

Your "Data availability" sections in Examples 1 and 2 are redundant. It's the same data, and so it only needs to be specified once.

The "Data availability" in Example 2 is removed.

Comment 20

You define both bkg_model and bkg_original. Why do you have two background models, and what is the difference? I think there should be just one background, which should be the same that is used in the simulated data.

I kept this separate because the background component and the background model are different concepts here. The background component is the background counts we received from the simulation (or observation) while the background model is the one we used to model the background component.

GRB case

Crab case

Both bkg_model and bkg_original are the same 3-month Albedo background, which means that we perfectly model the background component. But I think this is OK since these are all simulations.

Comment 21

Please give the actual path where people can find the file.

The specific paths are added to the text.

Comment 22

Please remove the last empty cell at the bottom of the notebook.

Done.

Comment 23

Your doc strings look better. There are still some things to refine, but this can be done later on

Thank you for pointing it out! The doc string has been updated.

ckarwin commented 5 months ago

Hi @Yong2Sheng Thank you for implementing all these changes. I think your PR is ready to be merged now. Nice work!

ckarwin commented 5 months ago

One last thing @Yong2Sheng, Did you remove the files from the test_data directory?

Yong2Sheng commented 5 months ago

Hi @ckarwin, thank you for checking it out!

I now removed the GRB files from test_data. I also have some other files about Cygx-1 balloon pointing data in the test_data folder. These are used as the testing/tutorial files for the SpacecraftFile module (Point_source_resonse.ipynb notebook). Should I remove these as well? Thanks!

ckarwin commented 5 months ago

Eventually the response tutorials should also be better organized and put into their own respective directories (like the other example notebooks). But you should also coordinate with @israelmcmc to make sure you're not removing anything that is needed for testing and/or DetectorResponse.ipynb. But that's a separate issue. For now, I'm going to merge this PR, since everything pertaining to the TS map is in good shape.