LSSTDESC / SSim_DC1

Configuration, production, validation specifications and tools for the DC1 Data Set.
4 stars 2 forks source link

PhoSim indexing fails with very large numbers of input objects #24

Closed drphilmarshall closed 6 years ago

drphilmarshall commented 7 years ago

This is a re-post from @rbiswas4 at https://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/383:

PhoSim indexes astrophysical sources with floats. For very large integers used as instance catalog indices , this causes indexing failures due to numerical rounding issues. This was found in the first set of Twinkles runs. A ticket describing the problem - PHOSIM-27 - was filed on the PhoSim Jira.

For Twinkles Run 3, this problem was reported in https://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/312 and taking advantage of the restricted nature (single CCD, rather than all sky), this issue was solved for Twinkles Run 3 in https://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/340 . However, in general (where there will be more objects due to larger sky area) the problem is unsolved.

cwwalter commented 7 years ago

Thanks for moving/filing this!

cwwalter commented 7 years ago

@rbiswas4 and @drphilmarshall Is this an issue for us in DC1? DC2? According to PHOSM-27, John feels this shouldn't be an issue since really they are stored in double precision.

rbiswas4 commented 7 years ago

It is a DC1 issue. I think this was addressed by @danielsf and @SimonKrughoff in the context of Twinkles. The solution depended on the fact that we had a really small area. This may or may not be ok for DC2, depending on the area.

cwwalter commented 7 years ago

So, just to make sure I understand from reading the Twinkles PR: This an issue solely in making the catalogs correct?

If this is going to be an issue for DC2, some one should follow up on PhoSim-27 and let John know you don't think his comment really solves the issue.

rbiswas4 commented 7 years ago

So, just to make sure I understand from reading the Twinkles PR: This an issue solely in making the catalogs correct?

Not sure what you mean: The catalogs and phosim images are actually generated correctly (at least in the cases we have seen). However, if you want to do comparisons of the input catalogs and the phosim centroid files (which I think is a very useful step), then the indices are often different (because of the representation as doubles).

If this is going to be an issue for DC2, some one should follow up on PhoSim-27 and let John know you don't think his comment really solves the issue.

Whether it is an issue might depend on the size of DC2 ... I am traveling this week, so I don't have the details. I guess @danielsf would be able to help ...

danielsf commented 7 years ago

I actually do not think this will be a problem for DC2 (nor was it a problem for DC1). It was only an issue in Twinkles. Someone should check my reasoning though:

PhoSim stores the indices as double-precision floats. This means they can store an integer of 2^52 ~ 4.5e15 exactly. Therefore, the PhoSim indexing would only break down if we ended up needing more than 4.5e15 independent sources. Twinkles ran afoul of this limit because, when we injected lensed AGN and SNe, we were forced to bit-pack their indexes to prevent them from colliding with the uniqueIDs of the ~ 3.5e10 galaxy uniqueIDs already allotted by CatSim.

So: as long as we aren't injecting new transients into DC2 (are we?), we should be able to live with the way PhoSim currently handles object indexes.

cwwalter commented 7 years ago

Now the thinking is that DC2 will include transients at normal density with an embedded Twinkles field.

danielsf commented 7 years ago

In that case, we will have a problem. As Rahul said, we were able to fix this in Twinkles because it was a single chip on a single point in the sky, so we only had to keep a few million unique galaxy indices, rather than a few tens of billions. Once we break that assumption, we need PhoSim to store indices as ints or strings.

danielsf commented 7 years ago

I thought @SimonKrughoff hacked together a quick solution on his own branch of PhoSim and issued a pull request. Whether or not that pull request gets merged to PhoSim master, could we just use Simon's branch with his quick fix?

rbiswas4 commented 7 years ago

transients at normal density with an embedded Twinkles field @cwwalter Twinkles had transients/variables at higher than normal density and it was a 10 year survey.

So, to me the questions are:

These two questions will be important in quantifying the number of transient/variables (as a hack) Scott is talking about.

cwwalter commented 7 years ago

Yes. Your understanding is correct. Right now we are planning 10 years, 6 bands, 300 sq degrees. We will have normal density for transients over the WFD field and an embedded Twinkles DDF. I'm not sure what the size of that is. That is for conversation at SUNYSB.

fjaviersanchez commented 6 years ago

I think that this is not relevant anymore so I'm closing this (please, feel free to reopen @cwwalter)