Closed sdmork closed 1 year ago
@simonmork This notebook requires some minor updates before it would be ready to request a Data Lab review. In particular, I think that we might want to see if we can make a version that runs a bit more quickly. Let me know if you'd be interested in pursuing these updates over the summer.
@simonmork I've made some updates to the notebook (including changing the name to something more descriptive). Take a look and let me know if this looks ok to you.
@jacquesalice and @rnikutta: Both Simon and I have taken another pass through this notebook. Would you have a chance to start a formal review? Thanks!
Hi @kadrlica and @simonmork , many thanks! We'll review ASAP (might be next week and the week after though, due to deadlines and travels).
Hi @jacquesalice,
I have implemented the comments you mentioned in your above message. The only change I did not implement was #9 because the magnitudes that I pull I want to be dereddened and I am unsure if the precomputed colors in the tractor catalog are dereddened.
Let me know if there are any other final changes you want me to make!
-Simon
@kadrlica
Thanks Alice and Simon. Sorry about the extra notebook, that was probably my fault. Do we want to commit all the cutout images?
Thanks @simonmork . Indeed, the pre-computed colors appear to be using the not-dereddened mags, which can be confirmed like this:
df = qc.query("select mag_g, mag_r, dered_mag_g, dered_mag_r, g_r from ls_dr9.tractor limit 5",fmt='pandas')
df['my_g_r'] = df['mag_g'] - df['mag_r']
df['my_dered_g_r'] = df['dered_mag_g'] - df['dered_mag_r']
df
mag_g mag_r dered_mag_g dered_mag_r g_r my_g_r my_dered_g_r
0 25.273912 24.131920 25.204136 24.084917 1.141993 1.141992 1.119219
1 25.448313 25.026280 25.378372 24.979166 0.422033 0.422033 0.399206
2 24.322979 23.566902 24.253012 23.519772 0.756077 0.756077 0.733240
3 26.226358 24.410812 26.156416 24.363699 1.815546 1.815546 1.792717
4 24.746489 24.035713 24.676622 23.988651 0.710775 0.710776 0.687971
I'll add my comments to the NB (if any) in the coming 2-3 days if that's ok.
@kadrlica No, please don't commit any outputs other that the NB itself. Thanks.
@rnikutta I have recommitted changes deleting all exterraneous cutouts. Thank you for clearing up my confusion regarding the precomputed colors and reddening.
Hi @simonmork and @kadrlica, I am a new datalab team member. Thanks for your contribution! It is a very nice NB.
Here are only minor suggestions:
Hi @simonmork and @kadrlica, I am a new datalab team member. Thanks for your contribution! It is a very nice NB.
Here are only minor suggestions:
- In Goals, "SDSS-identified spiral galaxy Milky Way analogues" -> "SDSS-identified Milky Way analogues" or "SDSS-identified Milky Way-type spiral galaxies"
- In Summary, "Milky-Way analogues are a unique class of object" -> "Milky Way analogues are a unique class of objects"
- Can you fix the reference numbering to match flow? For example, the second reference appeared in Goals section is numbered as [4], not [2].
- Missing unit on y-axis label in color-magnitude plots.
Hi @galaxyumi, I have implemented the changes you listed in this message and have pushed them to my branch. Let me know if there is anything else you need.
Thanks @simonmork!
@rnikutta, @jacquesalice - the NB looks good to me.
Hi @simonmork and @kadrlica ,
sorry for the stretched-out reviews... I finally found some time to take your NB fr a spin, it's very nice indeed! Thank you so much for contributing it to Data Lab. My comments below; they should be the last ones!
Link any papers and external references directly. E.g., Licquia et al. 2016 [1]. You can either link directly (not need for footnotes), or link internally to the footnote, and from the footnote to the external resource.
Replace anywhere in the text "LSDR9" with "LS DR9"
In "Goals": "in order to locate Milky Way analogues to higher redshifts than many SDSS-identified Milky Way analogues." --> "in order to locate Milky Way analogues to higher redshifts than many SDSS-identified ones."
In "Summary": "we query the datalab servers" --> "Data Lab servers"
"sources that are best-fit with a galaxy profile" --> "best fit"
In "Imports & Setup": I think you don't need import warnings; warnings.filterwarnings("ignore")
, as I didn't see any warning without it. (Also, ignoring all types of warnings is not a good strategy, one might miss something unexpected yet important.)
If you want the result of a query as, e.g., a Pandas dataframe, you can do this:
q = "SELECT foo, bar FROM ..."
response = qc.query(adql=q, fmt='pandas')
I.e. you then also don't need to import the 'convert' helper function.
Remove %%time
from the one cell that uses it. Since recently we have a built-in cell execution timer installed.
Nice solution you found to split the RA and Dec ranges into parts, run your query for each quadrant independently, and combine the results in the end. I must admit I was surprised that this was faster than running the query once, but on the entire RA and Dec range. Our DB specialist, and myself, have tried for some hours to find a faster way, but any solution required non-standard trickery, so we can't recommend it.
For your looping over the RA and DEC slices, I suggest this simplification (simpler than the current logic):
time.sleep(10)
, it only slows down the NB.JOIN ON
syntax (the WHERE id1 = id2
syntax is equivalent, but deprecated)BETWEEN
syntaxCode to replace the entire looping query cell:
width = 20
RAMIN, RAMAX = 90, 130
DECMIN, DECMAX = 40, 80
dfs = []
query = """
SELECT t.ra, t.dec, t.type, t.dered_mag_g, t.dered_mag_r, t.dered_mag_z,
t.shape_e1, t.shape_e2, t.shape_r, p.z_phot_mean
FROM ls_dr9.tractor as t
JOIN ls_dr9.photo_z as p
ON t.ls_id = p.ls_id
AND p.z_phot_mean BETWEEN 0 AND 0.1
AND t.ra BETWEEN {} AND {}
AND t.dec BETWEEN {} AND {}"""
for ra1 in range(RAMIN,RAMAX,width):
for dec1 in range(DECMIN,DECMAX,width):
t1 = time.time()
q = query.format(ra1, ra1+width, dec1, dec1+width)
response = qc.query(sql=q, fmt='pandas', timeout=600)
dfs.append(response)
print('FINISHED FOR RA %d-%d, DEC %d-%d AFTER %d s' % (ra1,ra1+width,dec1,dec1+width,time.time()-t1))
df=pd.concat(dfs,axis=0)
df
Please adjust the description of the looping procedure (just under the headline "Data Access") accordingly.
Please follow PEP8 recommendations regarding whitespace around operators, assignments and in variables tuples. https://peps.python.org/pep-0008/#whitespace-in-expressions-and-statements
Examples:
This:
masterfilter=np.ones(len(arrays[0]),dtype='bool')
should be:
masterfilter = np.ones(len(arrays[0]),dtype='bool')
This:
ra,dec,width=i,j,int(SKIP/2)
should be:
ra, dec, width = i, j, int(SKIP/2)
etc.
Note that there's no whitespace around '=' in keyword arguments to functions.
If you're using a function only once, don't define it. Functions are for re-use. I.e., remove def noni()
and have the code directly in the cell.
Also, the mechanics of defining your filters are unnecessarily complicated. The entire cell that contains def noni()
can be replaced with this:
ninfilter = df['dered_mag_g'].notnull() & df['dered_mag_z'].notnull() & df['dered_mag_z'].notnull()
typfilter = (df['type'] == 'DEV') | (df['type'] == 'EXP') | (df['type'] == 'SER')
data = df[ninfilter & typfilter]
print(len(data),'galaxies after quality cuts')
Several comments on this cell:
g = data['dered_mag_g']
r = data['dered_mag_r']
z = data['dered_mag_z']
g_r = g - r
r_z = r - z
redshifts = np.asarray(data['z_phot_mean'])
abs_e = (data['shape_e1']**2+data['shape_e2']**2)**0.5
a = (1-abs_e)/(1+abs_e)
half_light_rad = np.pi/180/60/60*data['shape_r']
half_light_kpc = cosmo.angular_diameter_distance(z=redshifts).value*1000*half_light_rad
cosmo_correction = cosmo.distmod(z=redshifts).value + 5*np.log10(cosmo.h)
M_g = g - calc_kcor('g', data['z_phot_mean'], 'g - r', g_r) - cosmo_correction
M_r = r - calc_kcor('r', data['z_phot_mean'], 'g - r', g_r) - cosmo_correction
M_z = z - calc_kcor('z', data['z_phot_mean'], 'r - z', r_z) - cosmo_correction
# Compute XYZ, which is the ABC (and add reference if possible)
abs_e = (data['shape_e1']**2+data['shape_e2']**2)**0.5
a = (1-abs_e)/(1+abs_e)
Same for the two lines computing the halflight radius.
Think if you could use astropy.units; and if not desired, at least state the units for each line
At the end, before "References", add a "File Cleanup" cell.
# !rm -f `pwd`/*png
Hi @rnikutta,
Thank you for your detailed feedback. I have implemented your requested changes, and so the notebook should be ready for a final review and merge. Let me know if there is anything else I need to do. Thanks again!
Thanks @simonmork ! A few items have not been addressed though:
RAMIN, RAMAX, RACHUNKS = 90, 120, 3
DECMIN, DECMAX, DECCHUNKS = 40, 70, 3
ra_width = (RAMAX-RAMIN)/RACHUNKS dec_width = (DECMAX-DECMIN)/DECCHUNKS for ra in np.linspace(RAMIN, RAMIN+ra_width(RACHUNKS-1), RACHUNKS): for dec in np.linspace(DECMIN, DECMIN+dec_width(DECCHUNKS-1), DECCHUNKS):
is simpler than the one I proposed:
width = 20 RAMIN, RAMAX = 90, 130 DECMIN, DECMAX = 40, 80
for ra1 in range(RAMIN,RAMAX,width): for dec1 in range(DECMIN,DECMAX,width):
Why do you prefer it?
- Replace this wording:
" Since we are limited by the amount of data that any one query is allowed to take at any given time"
with
" Since we are limited by the maximum runtime of a query"
- Replace "In order to use the selection cell for arbitrary RA and Dec, change the following values:" with "Change the following values to adjust for your use case:"
- Fix typo in "half-light radiys"
Everything else looks great!
Hi @rnikutta, sorry about the link mixup and typos, those should be up to snuff now.
As for the loop logic, I've left in place my most recent implementation for three reasons.
range()
function only works for integers.(((RAMAX-RAMIN)%width == 0) and ((DECMAX-DECMIN)%width == 0)) == False
since it will overshoot the specified RAMAX and DECMAX in these instances. See the below image for an example.I appreciate your detailed follow-up very much!
Congratulations @simonmork! Thank you reviewers for all your detailed feedback!
Added a new Extragalactic tutorial detailing the process of finding Milky Way Analogues in the Legacy Survey.
@kadrlica