LSSTDESC / imSim

GalSim based Rubin Observatory image simulation package
https://lsstdesc.org/imSim
BSD 3-Clause "New" or "Revised" License
36 stars 15 forks source link

Add an option for a new flexible binary instance catalog #71

Open cwwalter opened 6 years ago

cwwalter commented 6 years ago

One of the initial driving development philosophies for imSim was to make it a drop in for PhoSim which could read the same instance files. This was very successful for allowing us to use it quickly and effectively.

However, as we have used it we have found we wanted to do somethings differently (like issues related to proper motion and nutation) and now the formats are in fact not exactly equivalent. Now that we have more of an infrastructure built up around the imSim ecosystem @jchiang87 and others have suggested we start from scratch and build a new parser that has all of the features we really want and need. Previous discussion about this topic has happened in email and also can also be found in LSSTDESC/DC2_Repo#16.

This parser could use binary data input files with several related pandas dataframes that would be compact and would also serve to store all of the related truth information in a way we can't now.

This format needs to serve several roles:

  1. Production runs generated from CatSim for the data challenges
  2. R&D work on the small scale
  3. Sensor only or calibration simulations for validation.

Having the ability to also have text based input (especially for 2 and 3) is also very desirable.

In the short term we should discuss whether this is a DC2 or DC3 era job. This format should be clearly defined so that if could also be used by other tools (including PhoSim) if desired.

cwwalter commented 6 years ago

Following up in this issue on a discussion that @danielsf and I had: We have now modified imSim to read the PhoSim instance style catalog again. This is attractive since it has reduced overhead/"paperwork" when we are doing things like DC2. It means we don't have to keep track of two different kinds of instance catalogs. Also, it is straight-forward because CatSim can do exactly what we need.

PhoSim wants a description of exactly what the source looks like at that time above the atmosphere. This means the user is responsible for proper motion and nutation etc. For CatSim use, this is no problem. But, for people doing their own studies with hand crafted instance catalogs, this requires extra work, where it is easy to make mistakes.

So, aside from things like binary formats etc which is what this issue was originally about, we think it might be good in the future to have a flag that would allow us to choose either PhoSim format instance catalogs or a native imSim format that used ICRS + proper motion entries like we had before.

cwwalter commented 5 years ago

Rename this issue to make clearer what it involves.

cwwalter commented 5 years ago

With all of the work on various pipelines and the extremely large amount of gzipped text file instance catalogs we have had to deal with for DC2, now is a good time to reconsider the instance catalog format used by imSim.

This issue is is about undertaking a design period and then implementing a new binary instance catalog that could be used by imSim or other programs.

This would hopefully be much more compact than the current instance catalogs and would also be more flexible in that we could easily pass more information and could also more easily allow for multiple options or descriptions of the input information.

Note: we would want to carefully think about how to do this and either use formats and tools that allow us to convert to and from text formats or supply our own. We would still want people doing simple studies to be able to write text format files (possibly just the current PhoSim format) but we would either have a new option for binary files in addition, or a a way to covert text files to the new format for use either externally or on read in.

One useful first study might be to do a simple estimate of the size savings if we wrote the current instance catalogs in a direct binary representation.

rmjarvis commented 5 years ago

I'd recommend writing a simple standalone program that can convert the binary to PhoSim format. Then we can pass around only binary files even for 2.0p. Then part of the script for running PhoSim would be to convert the instance catalog to ASCII at the start and delete that file at the end.

cwwalter commented 5 years ago

There are potentially two different jobs here:

We could potentially do the 1st before the 2nd. We would like to target this for DC2 but maybe not before the end of the year.

These changes also need a bidirectional binary text to binary converter.

cwwalter commented 5 years ago

Note one possible performance enhancement with a more flexible format would be the ability deal with all of the components of a galaxy (disk, bulge, knots etc) at once. This way would would only need to do some operations like sizing once.

cwwalter commented 5 years ago

We won't be doing a partial implementation for DC2 of the binary instance format, so I am removing the DC2 label.

cwwalter commented 4 years ago

Consolidating discussions: closing #222 which also discusses this.

cwwalter commented 4 years ago

@jchiang87 is currently testing a pandas based strawman for galaxies only as a followup to discussions at the Tucson meeting.