Ingest protoDC2 into CatSim and generate PhoSim output

katrinheitmann commented 7 years ago

Via the GCR, read the protoDC2 catalog into CatSim and generate a dataset that is ready to be run through PhoSim.

jchiang87 commented 7 years ago

Since the "instance" catalog formats for phosim and imsim actually differ, I think the end point of the work in this issue should be some single bit of code that we can run to generate appropriately formatted catalogs for either tool. For DC1, we generated the instance catalogs as part of the simulation pipelines, but failed to use exactly the same code to define which sets of objects to include (the imsim data lacked the AGN components), so having a single module that served that purpose (presumably using some means of trackable configuration) would be desirable.

katrinheitmann commented 7 years ago

So my (very vague) understanding was that imSim is embedded in CatSim and we would actually not print a catalog if we want to run ImSim, compared to PhoSim where we dump an ascii catalog from CatSim and then read it in. Is that not the case?

On 10/20/17 3:37 PM, James Chiang wrote:

Since the "instance" catalog formats for phosim and imsim actually differ, I think the end point of the work in this issue should be some single bit of code that we can run to generate appropriately formatted catalogs for either tool. For DC1, we generated the instance catalogs as part of the simulation pipelines, but failed to use exactly the same code to define which sets of objects to include (the imsim data lacked the AGN components), so having a single module that served that purpose (presumably using some means of trackable configuration) would be desirable.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/DC2_Repo/issues/16#issuecomment-338316311, or mute the thread https://github.com/notifications/unsubscribe-auth/AMQ9jOhNQJSlJMraMyXINuYHXR2oSCiuks5suQSZgaJpZM4QA00l.

cwwalter commented 7 years ago

So my (very vague) understanding was that imSim is embedded in CatSim and we would actually not print a catalog if we want to run ImSim,

No that is not really right. ImSim takes the catsim output as an input too. We could in principle to do the lookup on the fly in the program and never write the instance catalog and read from it, but as Jim said we did it as part of the pipeline last time.

cwwalter commented 7 years ago

But, there are some CatSim tools that can drive GalSim inside the LSST-GalSim interface we use. That existed before imSim and I wrote it to factor the problem to allow us to work in a similar way to PhoSim. That is probably what you are remembering.

cwwalter commented 7 years ago

I agree with Jim's comment above BTW (single tool to write instance catalogs). Since we also are going to have to get halo information etc into some sort of other table (including possibly for things light ICL in the instance catalog as we discussed earlier today) perhaps the tool could address that too.

danielsf commented 7 years ago

As you say, Chris, that code that drives ImSim is integrated into CatSim. It was just rewritten to accept InstanceCatalogs. Given that we discovered that ImSim can't use the same InstanceCatalogs as PhoSim (on account of the RA, Dec shenanigans), do we want to keep working in that framework?

sims_GalSimInterface includes classes that can generate images and simultaneously output PhoSim-ready InstanceCatalogs containing all of the simulated objects, which would allow us to solve the problem of "were ImSim and PhoSim run on the same astrophysical inputs?"

cwwalter commented 7 years ago

I think we still want to drive the the interface through the imSim external driver. It is the thing that allowed us to work quickly and and factor the problem including letting us set various options, choose sky configurations, drive the electronics readout etc. Having the interface and GalSim core available to be separately developed has also worked well. I think this is very valuable.

But, we had a discussion before (which we decided to not do on DC2) on whether we should still use the instance catalog format. (Can anyone find that issue? It is the one where we listed 3 options).

I think the options were

1) Still keep using a (now a bit modified) PhoSim instance catalog input

2) Just have imSim ask CatSim to look up the objects directly and never write the instance catalog out but go straight to the image generation. This should be straightforward and would be a different mode of running the program instead of reading the instance catalog (you would still need the headers in some form though)

3) I can't remember 3 :)

One thing we learned from the dark matter people is that being able to write a human readable and phoSim formatted file is still incredibly valuable. They were able to develop and debug the diffuse sources problem doing this. So, I think we want to keep this ability no matter what. We will also always want to use it for developing and testing things on a smaller scale.

I think being able to point people to the instance catalog documentation (although we now have the proper motion additions which I think is more rational) also was helpful since it meant that the knowledge they gained was useful across tools.

As far as 2) I think we might want to consider this for DC2. I think the "pros" are that it saves disk space which as I recall was a non-trivial factor and the reason Jim originally suggested it. On the "con" side I think having a human readable input file that was used and can be regenerated could be pretty useful for debugging and understanding problems while looking at the DC2 output. I suppose there could be some hybrid where it wasn't normally written out though.

jchiang87 commented 7 years ago

I'd like to propose an option 3):

Since the input formats are incompatible anyways, we could revise the object specifications and have the input catalogs to imsim serve double duty as more serviceable the input "truth" catalogs. Right now, the imsim/sims code starts from the SED normalized at 500nm and computes the ADU to feed to GalSim based on the LSST band for that simulation. How about we make the input catalogs already have the corresponding apparent magnitudes for the given band, and so factor out the integration of the SED over the band pass from the imsim code.

Also, I would recommend a more reasonable format, i.e., something that has a distinct header component for the physics commands and a separate data structure for the object catalog with the magnitudes, shapes, etc. I would advocate a set of pandas data frames persisted to a single file. If someone wanted to hand-craft or modify an input catalog, they could do it in python and just save the data frames rather than having to deal with writing out an inconveniently formatted ascii file. It would also be more compact than the ascii formatting and provide ways of subselecting data via the pandas query functionality.

cwwalter commented 7 years ago

I knew there was a three :)

Yes, I support and agree with this. If they are going to be different, we might was as well make them do what we want and solve a lot of problems we have. Having binary files would also make them much more compact.

Two comments/questions:

I think the text got a bit garbled in the first sentence. Are you suggesting that these are also our truth data frames we would use for analysis, and also to load the qserve database?
I really feel having a human readable input text format is also important. We saw what the DM people did, and I have used this a lot to do things like make spots for sensor studies. I think having something you can do in an editor with cut and paste and just keeping the editor open and changing a number as you test and re-run is really an important use case.

So, there are a few possibilities. We could just keep the PhoSim input around for this (but this might cause a lot of complications if we are using different definitions of magnitude etc). We could have a simple text parser which reads a text version (if they use pandas data frame maybe we could point to a text file and use the built in csv parser), we could have a converters etc etc..

I get that we can have people write python programs to do it too, but this doesn't seem nearly as good for some use cases. What are your thoughts about this Jim (including just bailing on it)?

Of course if we are going to do this for DC2; it is quite relevant for adding the lensing etc since we might want to do it in the new scheme and not for PhoSim input etc.

cwwalter commented 7 years ago

One other thing I like about this idea is that we could put a dataframe in the file with the CSim halo truth info etc that would then ride around with the galaxy and star info.

jchiang87 commented 7 years ago

If there is a need for the ascii input mode for a set of special use cases, then I suppose we can keep the current parsing code around, but for production runs, we should have an input option that is more robust and that reads a more compact and better structured data format.

cwwalter commented 7 years ago

If there is a need for the ascii input mode for a set of special use cases, then I suppose we can keep the current parsing code around, but for production runs, we should have an input option that is more robust and that reads a more compact and better structured data format.

OK. I'm flexible to whether we do this with the existing code [I think this has several advantages but it might be some work to keep them in sync (I'm thinking of things like lensing parameters)] or if we let the new parser do it (say through calling a text csv df read or a binary read).

In addition to running "production" and the tests I mentioned above we want to be able to drive "official sensor only" runs for sensors model validation. I had imagined doing that in the text interface but I suppose it doesn't have to be that way.

I'll add a new imSim issue about this in the imSim repo.

katrinheitmann commented 7 years ago

@danielsf Hi Scott, Do you mind adding the steps needed to generate the PhoSim input on top of ProtoDC2? Maybe we should open a new issue that will capture the work on scaling up our approach to the actual DC2 (300 sq degree) catalog? Thanks!

danielsf commented 7 years ago

@katrinheitmann Sorry. I'm not sure what you're asking. I am working on a branch here

https://github.com/danielsf/gcr-catalogs/tree/sed_fitting

that use Yao-Yuan's GCR to generate PhoSim InstanceCatalogs from the protoDC2 catalog. Is that what you meant, or is there a document you are asking me to add to?

jchiang87 commented 7 years ago

This should have been closed by #49. The generateDc2InstCat.py script has the desired functionality.

LSSTDESC / DC2-production

Ingest protoDC2 into CatSim and generate PhoSim output #16