Which test and tutorial datasets to add? - Githubissues

astrostat / pylira

A Python package for Bayesian low-counts image reconstruction and analysis

9 stars 7 forks source link

Which test and tutorial datasets to add? #3

Open adonath opened 2 years ago

adonath commented 2 years ago

For testing as well as documentation we need some small datasets. For this we can use existing datasets in:

The datasets seem more suitable for testing as they use a simulated E letter and not any real data. However there are multiple combinations and sizes available. Which one would you recommend @anetasie and @vkashyap for testing?

For documentation and examples we should probably use real datasets. There are a few public one we can use for this:

HESS data https://www.mpi-hd.mpg.de/hfm/HESS/pages/dl3-dr1/ (can be prepared as FITS images with Gammapy)
Fermi-LAT data, such as https://github.com/gammapy/gammapy-fermi-lat-data or any datasets prepared with the Fermi science tools / fermipy
Chandra data of course. Are there any "ready to use" (i.e. counts / exposure / background) FITS images we can use?
Any further suggestions?

anetasie commented 2 years ago

For the tutorial the one from Katy is the most recent one and it is called 'quick' - this is in the pdf format and uses liraOutput.R script to run it. https://github.com/astrostat/LIRA/tree/master/lira/tutorials

@adonath Of course you can use HESS and Fermi-LAT data for the docs! For Chandra we could use one of the Katy's examples in the sub-directories with python scripts to generate input files - these are also old! I should upload a complete data set for these two examples. https://github.com/astrostat/LIRA/tree/master/lira/python

adonath commented 2 years ago

Thanks @anetasi, indeed having a complete dataset for the two example would be nice!

I think we can also easily transfer the content of the PDF files to executable tutorials such as Jupiter notebooks, once the wrappers are actually available. I'll set up the infrastructure for the data handling now. I think the test and tutorial datasets are small enough, that we can include them in the code base. My proposal would be one FITS file per dataset, with different HDUs for counts / exposure / background and psf. The code to generate the files I'll put in a second repository, e.g. pylira-extra.

anetasie commented 2 years ago

I would keep separate input files and not place them into one FITS, mainly because the separated files are typically available as standard products in Chandra and are easily generated.

adonath commented 2 years ago

@anetasie Fine by me as well.

Is there actually an example parameter file somewhere? I couldn't seem to find any...

anetasie commented 2 years ago

https://github.com/astrostat/LIRA/blob/master/lira/tutorials/liraOutput.R

This is the file that is being used to run lira in R. Is this what you mean by parameter file? I'll get the files for 10307 and 10308 - it does need null images in addition to data and psf.

anetasie commented 2 years ago

Here is the printout of the file that I used to run lira for 10307. This is a set of commands after starting R:

library(lira,lib.loc ='/data/sherpa/aneta/Science/Lira/Tests/Rlira')
library(FITSio,lib.loc='/data/sherpa/aneta/Science/Lira/Tests/FITSio')
require(lira)
source('liraOutput.R')

setwd('../10307/Nulls_fixed')
obsFile<-'img_64x64_0.5.fits'
bkgFile<-'null_q1_c1_64.fits'
psfFile<-'psf_center_33x33.fits'
startFile<-'null_q1_c1_64.fits'

for(ii in 1:50){
obsFileNull<-paste('simulated_null_',(ii-1),'.fits',sep='')

testLira<-testLira<-liraOutput(obsFile=obsFileNull,bkgFile=bkgFile, psfFile=psfFile,startFile=startFi
le,maxIter=2000, alpha.init=c(3,4,5,6,7,8,9),fit.bkg.scale=T,outDir='/data/qednew/Lira_input/Lira_run
s/output_10307_fixed/')
}

testLira<-liraOutput(obsFile=obsFile,bkgFile=bkgFile, psfFile=psfFile,startFile=startFile,maxIter=200
0, alpha.init=c(3,4,5,6,7,8,9),fit.bkg.scale=T,outDir='../Lira_runs/output_10307_fixed/data_')

anetasie commented 2 years ago

The above commands were used to run lira on 50 simulations (for loop) and then on the image data - the last command testLira<-liraOutput()

adonath commented 2 years ago

Thanks @anetasie, I just saw that the main analysis method takes a parameter file (see https://github.com/astrostat/LIRA/blob/master/lira/src/bayes-image-analysis.c#L2064) and I was wondering about the structure of this file. I think it might be could to have an example for testing as well. Maybe @infinitron has one?

anetasie commented 2 years ago

This is the name of the output parameter file which will be generated by lira. There are two output files, one called .out which contains the images and one call .param with the list of the parameters, likelihood, fitted alphas etc. We can create both to use as a baseline for the test.

anetasie commented 2 years ago

@adonath I added the files into this directory. I included the output .param file as an example, but did not include .out files which is large. I guess for the test we could just generate a smaller number of iterations to keep .out file small. https://github.com/astrostat/LIRA/tree/master/lira/python/test/10307

infinitron commented 2 years ago

We also need to use the same random seed used to generate the dataset for testing. The current C code generates the seed at runtime by calling some internal function in R. I tried to set it manually but it resulted in errors when building the R package. It seems to work fine if I compile the C code manually and call it from R. I'll add a short document with build instructions, and calling it from R would only require minor changes to Katy's wrapper.

adonath commented 2 years ago

Thanks @anetasie!

For the record: I added a minimal test dataset in https://github.com/astrostat/pylira/commit/31a2bcc2e25db574c51da208c0a278896ce18493 For illustration see https://pylira.readthedocs.io/en/latest/pylira/data.html

I could go ahead and add the test data, that @anetasie added as a next example.

adonath commented 2 years ago

Just for the record:

Output parameter files and image trace are available for testing (and already used) here https://github.com/astrostat/pylira/tree/main/pylira/data/files
Methods to read the files are implemented here: https://github.com/astrostat/pylira/blob/main/pylira/utils/io.py
"Toy" example datasets generated by methods are here: https://github.com/astrostat/pylira/blob/main/pylira/data/core.py and illustrated here: https://pylira.readthedocs.io/en/latest/pylira/data.html

TODO:

I think we should at least add one Fermi-LAT dataset, one TeV dataset and of course one Chandra dataset