LSSTDESC / cosmodc2

Python package creating the cosmoDC2 synthetic galaxy catalog for LSST-DESC
Other
7 stars 1 forks source link

A 3D look at v1.1.4 #83

Closed plaszczy closed 5 years ago

plaszczy commented 5 years ago

I had a look at the object position_xyz's in the latest v1.1.4_small catalog. Although it is clearer when you move around , there are some overlapping regions (probably at the healpix borders) with higher density. why?

v1 4 tiles
yymao commented 5 years ago

[Edited: v1.1.4]

yymao commented 5 years ago

Do such overlapping/high-density regions also appear when plotted in sky coordinates (RA, Dec)?

evevkovacs commented 5 years ago

My initial thought was that this might be a projection effect due to using x,y,z rather than ra,dec, redshift, but I don't think that was correct, so I'm editing this comment. We have made plots like this in the past and have not seen this problem. We are looking at v1.1.4 now.

plaszczy commented 5 years ago

not sure what this means: (Ra,DEC) is by essence 2D. If you apply tomography then you project out these features. A different question is: is (x,y,z) compatible with (ra,dec,redshift) I don't know (and let you see). Note that here we have applied a redshift cut (around 1) but i can't see how it can do this.

plaszczy commented 5 years ago

I see. we'll try with (ra,dec,redshift) and let you know.

yymao commented 5 years ago

I'd like to know if this overlapping/high-density regions also appear in 2D (i.e., in sky coordinates, RA and Dec) because it is a much bigger issue if some patterned high-density regions appear in sky coordinates.

Then, back to the 3D case, one thing to check is whether the overlapping regions are just coming from high-redshift where more boxes are stitched together to create the lightcone (as Eve suggested).

patricialarsen commented 5 years ago

I'm not seeing anything suspicious in RA and DEC This is without any magnitude cuts, and with redshift cuts of 0.2->0.8

ra_dec_small_1_1_4

patricialarsen commented 5 years ago

@plaszczy I'm struggling to replicate the issue - is this a very narrow redshift cut around z=1? or a comoving distance cut? @yymao as DC2 has a large box size there shouldn't be any stitching of boxes at all below z=1.2.

patricialarsen commented 5 years ago

Here is a screen shot of my attempt to replicate the issue by taking a cutout from z=0.95 to z=1.0 and 3d plotting the x,y,z positions. I'm not seeing any overlaps here, is it possible this is a read-in issue from the plotting program? Otherwise if you could give me any more information to locate what's going on it would be very much appreciated!

screen shot 2018-12-06 at 11 19 10 am
plaszczy commented 5 years ago

thanks we'll check on our side and I'll let you know. Just to be sure you are plotting position_x,postion_y,postion_z?

patricialarsen commented 5 years ago

I'm plotting x,y,z but I've confirmed that these are exactly equal to position_x, position_y, position_z. The redshift cut in this plot is 0.95<z<1.0, the catalog is cosmoDC2_v1.1.4_small (as read in from the GCR), and there are no additional cuts. This plot subsamples every 100th galaxy for the sake of memory but increasing the sampling doesn't appear to change the result.

plaszczy commented 5 years ago

we continue investigating, this is a very faint effect (normal you don't see it with your tools). Not sure yet whether it is real (we observe it with on -some- monitor displays but not others, but then why does it follow the pixels shape?). In the meanwhile I share an amazing display of cosmoDC2 on a monitor-wall (you should see it spinning!) mur_0

rmandelb commented 5 years ago

That is very cool - thanks for sharing!

dkorytov commented 5 years ago

I took a glance at the positions as given by the GCR for cosmoDC2_v1.1.4_small, 0.95<z<1.0. I didn't see the grid pattern (1st figure). Subsampling every 100th galaxy more or less looked the same. Maybe a different subsampling scheme can introduce some sort of artifacts.

I also split the galaxy into two populations: 1) galaxies that were populated onto halo from the lightcone (2nd figure) 2) faint galaxies sprinkled into the healpix volume (3rd figure).

Maybe something was sneaking in through the method we generated population 2.

Nothing jumped out at me from the plots.

All galaxies screenshot from 2018-12-07 15-03-13

Galaxies from the halo lightcone screenshot from 2018-12-07 15-17-35

Faint Galaxies screenshot from 2018-12-07 15-24-36

salmanhabib commented 5 years ago

Anyone tried something quantitative yet?

plaszczy commented 5 years ago

I transmit a message from our viuslaiztion specialist (Guy Barrand) about where we stand:

The effect is related to “blending”. In the visualisation code, to handle transparency, I had by default the OpenGL code :

glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA,GL_ONE_MINUS_SRC_ALPHA);

In principle the blending of pixels (the fact to “do something” with an existing pixel in the frame buffer by using the (r,g,b,a) of a “new pixel”) with the upper function do something only if the solids (if speaking to do transparency) have a color with an “alpha channel” which is not one (1 = opacity if speaking in term of transparency).

BUT, it appears that for points, even if the alpha is one, last Apple software (using Cocoa+AppleGL) does a blending of points! And then the “effect” on the DC2 data. And this appears only on my 2018 MacBookPro (MacOS-10.4.1); I do not have this effect on an older machine. I do not have the effect also by using X11/XQuartz for the windowing and AppleGL! And I do not have the effect on a Windows machine (then using here win32 for the windowing and the Microsoft opengl32.lib for the OpenGL).

By explicitly setting an alpha not one (for exa 0.5), the effect appears with XQuartz+AppleGL and also on the Windows. Then we understand now why the effect appears on one machine and not others. (Not clear yet if there is an explicit bug in Apple software). Then in our vis software, we changed the logic and switch off, by default, the blending (then doing a : glDisable(GL_BLEND), and we arrange to explicitly switch it on if wanting transparency. (And points are not blend now with the 2018 MacBookPro and no structure appears on DC2 data)

We understand also why we do not see the effect on our “wall of screens”, mainly because here, if having sufficient pixels, points statistically do not overlap.

We can understand also why the effect depends of the order of points in the data. (In Spark, if doing a “.duplicate()” or doing some “.sort()” according a column, the effect disappears).

Then blending of points “does something” and shows structures on screen in this particular case and I am quite confident that other 3D software that would enable blending may also see that.

BUT we do have yet a clear answer if blending of points reveals an anomaly in the data or not… (It looks as if there is some slight over density of points around borders of coarse graining (healpix?) tessels…). (For example, if visualising cloud of random points by using blending, no structure appears on screen. If rotating the random cloud, the overall color change, but no structure appears).

We are going to try to investigate if we can put in evidence such over density, in this particular data set, by other ways…

plaszczy commented 5 years ago

something that is maybe more worrying (but easier to reproduce) is to look at at [ra-dec] slice in z from the side . One sees a kind of multi-sheet like structure (in french we call that a "mille-feuille", really good but not here) along z. Note this does not have anything to do with the previous blending point and can be produced any a standard tool (as paraview here). The cuts are redshift_min = 1 redshift_max = 1.2 halo_id>0 center_ra = 62 half_ra = 0.9 center_dec = -38.6 half_dec = 0.9 (with this kind of cuts: "if center_ra-half_ra < ra and ra < center_ra+half_ra and\ center_dec-half_dec < dec and dec < center_dec+half_dec: )

image

plaszczy commented 5 years ago

the video: https://youtu.be/kd_PhRTZAao

plaszczy commented 5 years ago

In order to try to make you make feel how the problem appears. Imagine we start from vue0 and then rotate around the x axis: vue1

the "sheets" appear more clearly on the right side

plaszczy commented 5 years ago

OK so we get the conclusion for the 1st point (the healpixel structure in the very fist plot). We already knew from Guy's previous analysis that it has to do with the GL_BLEND sitwch: OFF you d'ont see anything particular, ON you get the pixels outlines (Apple changed recently the default from OFFto ON why had incosnistent results depending on the system: now everything is coherent). So the question remained: is this OPEN_GL blending telling us something about the data?

So we did the following thing: generate random 3D points in 2 passes (one half cube each) or in a single batch.

screenshot 2018-12-12 at 21 51 06

So the GL_BLEND (complicated) algorithm does not behave the same if data is presented in "clustered" bunches or fully randomized: this is what is observed on DC2 data where I put data in parquet file reading the healpixel after the other. Is this "a bug". ? Yes on a scientific side where we expect "invariance" from data order On the other it reflect the sequential structure of your data (so that if you don't remeber your healpixel scheme you may find it from here!). This dives into philosphical considerations: is visual perception a science?... so let's stop this and recommend: DO NOT USE GL_BLEND=ON when looking at 3D galaxy data (and be aware this became the default on some recent systems).

This close the 1st part of the thread. But thorough investigation lead us to the second "millefeuille" issue (which does not have anything to do with the point disucssed here), it is robust and more worrying. Did anyone was able to reproduce it?

dkorytov commented 5 years ago

I'm able to reproduce the "millefeuille" when I'm plotting in ra/dec/redshift space. It's not quite as visible as in your plot, but it's there. :) screenshot from 2018-12-13 16-02-44

The effect comes from that ra/dec and redshift map differently to spatial coordinates. Ra/dec are angular positions on a spherical sky, while redshift is a unit-less measure how fast galaxies are moving away form us. From the expansion of the universe, galaxies further away are retreating from us faster so redshift is used as proxy for radial distance. The redshift range over 1->1.2 covers a much larger spatial distance than does ra(dec) over 61->63 (37.7->39.5) deg. So the layers are just normal clustering of galaxies squeezed really tightly along one axis.

Using the same cuts on ra/dec/redshift, I plotted out the spatial position of the galaxies and annotated the direction of ra/dec/redshift in spatial coordinates.

image88

image9027

image9202

plaszczy commented 5 years ago

what I find worrying is the sheets regularity (not really compression along z axis, but thanks for pointing it). If I histogram my region in z (focussing on the righy ra part here it is:) hist_redshift_v114

I recall my cuts so that anyone can repoduce (I let you translate from spark this is pretty obvious)


filter("halo_id>0")
center_ra = 62
half_ra = 0.9
center_dec = -38.6
half_dec = 0.9
df=df.filter( (F.abs(df.ra-center_ra)<half_ra) & \
             (F.abs(df.dec-center_dec)<half_dec) & \
             (df.redshift.between(1,1.2)) )
df.filter(df.ra>62.5)```
plaszczy commented 5 years ago

this reminds me of of peaks on v1.0: https://portal.nersc.gov/project/lsst/descqa/v2/?run=2018-09-10_8&test=readiness_cosmoDC2&right=2018-09/2018-09-10_8/readiness_cosmoDC2/cosmoDC2_v1.0_9431_9812/p02_redshift.png

could it be the problem whas moved to lower scales?

evevkovacs commented 5 years ago

No, the problem that caused those variations was actually present across all redshifts. It was just less noticeable at lower redshifts. In any case, the problem was fixed in v1.1.4. See for example the redshift distribution here

plaszczy commented 5 years ago

if it was really a bug fix then inded it is not this (although the histogram you refer to has large bins wrt to my analysis so the effect won't be noticeable). Anyway I am focussing on a some small region of the sky, integrating many data smooths the histogram.

rmandelb commented 5 years ago

Within a small region of the sky, we expect cosmic variance to cause bumps in a redshift histogram that are beyond what one would expect for just shot noise. Assessing the significance of these bumps is therefore non-trivial and cannot really be done by eye.

However, as a qualitative check: if it is cosmic variance, then if you look at a similar-sized patch of sky that is well-separated from the first one, then the histogram should also have these structures, but they should be uncorrelated in redshift with the structure in the first region of sky. If they are correlated in redshift between two well-separated patches of sky, then that sounds more potentially problematic.

plaszczy commented 5 years ago

yes it can be CV but I don't like that the peaks are somewhat regular. I'll try to look at other regions indeed

plaszczy commented 5 years ago

so I moved my ra window (keeping the same dec) here are the overplotted histograms: 3x0js3aqgurwof8oqibtaywiwaaafbjrefukigokfykkdmabqufhysupqnqufbquehrogafbqwfixsla1bqufc4sfe6aaufbywlfkudufbqulhiutoabqufhysupqnqufbquej5fx5te5wktd2baaaaaelftksuqmcc

results depend a bit on the regions you choose but it looks (probably) OK.