abby-baskind / seniorthesis

0 stars 2 forks source link

my code is ugly but idk how to do it better cause i dont know python #2

Open abby-baskind opened 3 years ago

abby-baskind commented 3 years ago

hey @jbusecke and @gmacgilchrist (graeme, ignore this while youre on vacation, it's non-urgent)

So here's some code from my dissic_plots. I'm trying to make subplots of zonal DIC, and it works (i attached an image to prove plots were in fact made), but there has to be a more efficient way to do this. Like, there has to be a more efficient way to get the intake keys instead of copy pasting them. And I bet there's a way to loop through making the slices and making the subplots. But idk how to do that, so lemme know if you know a way to make this code more efficient and less like a first-grader wrote it

# getting intake keys
a = 'CMIP.NCC.NorESM2-LM.piControl.r1i1p1f1.Oyr.dissic.gr.gs://cmip6/CMIP6/CMIP/NCC/NorESM2-LM/piControl/r1i1p1f1/Oyr/dissic/gr/v20210118/.nan.20210118'
b = 'CMIP.CCCma.CanESM5.piControl.r1i1p1f1.Omon.dissic.gn.gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/piControl/r1i1p1f1/Omon/dissic/gn/v20190429/.nan.20190429'
c = 'CMIP.MPI-M.MPI-ESM1-2-LR.piControl.r1i1p1f1.Omon.dissic.gn.gs://cmip6/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/piControl/r1i1p1f1/Omon/dissic/gn/v20190710/.nan.20190710'

# making some slices
DICa = DICdict[a].isel(time=0).sel(x=slice(180,200)).mean('x', keep_attrs=True)
DICb = DICdict[b].isel(time=0).sel(x=slice(180,200)).mean('x', keep_attrs=True)
DICc = DICdict[c].isel(time=0).sel(x=slice(180,200)).mean('x', keep_attrs=True)

# create the subplot
figa, ((axa, axb, axc), (axE, axf, axg), (axh, axk, axl), (axm, axn, axo), (axp, axq, axr), (axs, axu, axv), (axaa, axbb, axee),(axgg, ax1, ax2)) = plt.subplots(nrows = 8,ncols = 3, figsize=[10, 15])    

# plot plot plot
ima = axa.pcolormesh(DICa['y'],DICa['lev'],DICa.dissic)
axa.set_xlim([-80,60])
axa.invert_yaxis()
axa.set_xlabel('Latitude')
axa.set_ylabel('Depth (m)')
cbaxes = figa.add_axes([1.03, 0.1, 0.02, 0.8]) 
cbara = figa.colorbar(ima, cax = cbaxes)
cbara.set_label(DICa.dissic.attrs['long_name']+' ('+DICa.dissic.attrs['units']+')', fontsize = 14)
axa.set_title(DICa.attrs['source_id']+ ' '+ DICa.attrs['table_id']+ ' '+ DICa.attrs['grid_label'])

imb = axb.pcolormesh(DICb['y'],DICb['lev'],DICb.dissic)
axb.set_xlim([-80,60])
axb.invert_yaxis()
axb.set_xlabel('Latitude')
axb.set_ylabel('Depth (m)')
axb.set_title(DICb.attrs['source_id']+ ' '+ DICb.attrs['table_id']+ ' '+ DICb.attrs['grid_label'])

imc = axc.pcolormesh(DICc['y'],DICc['lev'],DICc.dissic)
axc.set_xlim([-80,60])
axc.invert_yaxis()
axc.set_xlabel('Latitude')
axc.set_ylabel('Depth (m)')
axc.set_title(DICc.attrs['source_id']+ ' '+ DICc.attrs['table_id']+ ' '+ DICc.attrs['grid_label'])

# there are a lot more subplots but they are all basically written the same

dissic_plot

jbusecke commented 3 years ago

Hey @abby-baskind these look great!

Some pointers to make the code a bit sleeker (but overall there is nothing wrong with doing things manually first!):

1) Use for loops to avoid repeating code

#You dont have to spell out all the axes, you can just keep them in an array
fig, axarr = plt.subplots(...)

#loop over each key, value pair of the dictionary
ax_idx = 0 # index for your axes array
for name, ds in DICdict.items():
   ax = axarr.flat[ax_idx]
   ...write out the plotting only once
   #advance the axis index
   ax_idx += 1 # this is a nifty python trick to increment a variable by one

# but there is an even more elegant way
for ax, (name, ds) in zip(axarr.flat,DICdict.items()):
     ... fancy plotting stuff

2) Consider using xarrays plotting methods. You can rewrite this

# plot plot plot
ima = axa.pcolormesh(DICa['y'],DICa['lev'],DICa.dissic)
axa.set_xlim([-80,60])
axa.invert_yaxis()
axa.set_xlabel('Latitude')
axa.set_ylabel('Depth (m)')
cbaxes = figa.add_axes([1.03, 0.1, 0.02, 0.8]) 
cbara = figa.colorbar(ima, cax = cbaxes)
cbara.set_label(DICa.dissic.attrs['long_name']+' ('+DICa.dissic.attrs['units']+')', fontsize = 14)
axa.set_title(DICa.attrs['source_id']+ ' '+ DICa.attrs['table_id']+ ' '+ DICa.attrs['grid_label'])

like this (I am already using the variables looped over in the forloop (e.g. axa=ax and DICa=dissic_section):

dissic_section = ds.dissic.sel(.... # select/average your section. You have to do this inside the loop aswell.
im = dissic_section.plot(x='y', y='lev', yincrease=False)
ax.set_title(DICa.attrs['source_id']+ ' '+ DICa.attrs['table_id']+ ' '+ DICa.attrs['grid_label'])

xarray should handle e.g. the colorbar, units etc and might save you some time and code.

3) Set colorlimits? From your code it seems like you tried to add a colorbar, but I cant see it. To make the plots comparable, consider settting the colorlimits to the same value within the loop

im=ds.dissic.plot(..., vmin=0, vmax=5000) #not sure what a reasonable range is here

4) Easier string formatting Python has a neat way of formatting strings (f-strings), so you could replace DICc.attrs['source_id']+ ' '+ DICc.attrs['table_id']+ ' '+ DICc.attrs['grid_label'] with f"{DICc.attrs['source_id']} {DICc.attrs['table_id']} {DICc.attrs['grid_label']}"

One comment about the method: You are currently selecting the longitudes based on the x dimension (.sel(x=slice(180,200)).mean('x', keep_attrs=True)). x is only a nominal longitude (in some models it is just a logical index - counting from 0 to however many elements x has), the safer way to select the region is based on the actual 2D longitude lon:

dissic_section = ds.dissic.where(np.logical_and(ds.lon<=200, ds.lon>=180), drop=True).mean('x', keep_attrs=True)

Further down the line it might be of use to regrid each dataset to a common grid, but for now this here should suffice.

gmacgilchrist commented 3 years ago

The plots are looking excellent @abby-baskind and all great advice from @jbusecke for cleaning up the code. Looking forward to seeing where you got to. As @jbusecke says, absolutely nothing wrong with doing it "ugly" first time!

gmacgilchrist commented 3 years ago

@abby-baskind I think the key aspect to note of Julius' approach above is that you can loop through the items in your dictionary and pick out both the name of the model and the dataset (which has all the variables) for each of the models. That is in the bit:

for name,ds in DICdict.items():

Now your DIC data for model with name name is in ds. So you can plot that as before:

ax.pcolormesh(ds['y'],ds['lev'],ds['dissic'])
ax.set_title(name)

Putting these all in separate axes is what Julius does when he puts all the axes handles into axarr and then selects them one by one in the for loop. FYI the command flat just takes an organized array of objects (in this case, axarr is some m x n array corresponding to the rows and columns of your subplots) and flattens it out into a 1-D array.

gmacgilchrist commented 3 years ago

@abby-baskind I'm just noticing in your plots that some of your axes correspond to the exact same output (e.g. it looks like CanESM5 Omon gn is plotted twice?), or the same models but on different grids (e.g. on both gr and gn) or with different output frequencies (e.g. Omon and Oyr). Definitely a good thing to see all of the output that's available, but you should make sure that you are ultimately only working with distinct model simulations (rather than different output from the same simulations). Does that make sense? We can certainly help you parse through that.

abby-baskind commented 3 years ago

hey @gmacgilchrist i wasn't sure whether gn or gr/Oyr of Omon would be better so for the first attempt at plotting i just plotted all of them. i think it'll be pretty easy to extract the relevant grids and output frequencies. hope that clarifies why i did that

gmacgilchrist commented 3 years ago

Yep, totally reasonable. I think for your purposes any of these options will be fine - so whichever gives the most data will be best.

abby-baskind commented 3 years ago

quick q about the automatically generated colorbars using matplotlib.pyplot.plot...

for context, here's the relevant block of code (using the strategy from @jbusecke given earlier) and the output

fig, axarr = plt.subplots(nrows = 4, ncols = 4, figsize=[10,6])
fig.tight_layout(pad = 2)

#loop over each key, value pair of the dictionary
ax_idx = 0 # index for your axes array
for name, ds in dd.items():
#     print(ds)
    ax = axarr.flat[ax_idx]
    so_section = ds.so.where(np.logical_and(ds.lon<=200, ds.lon>=180), drop=True).isel(time = 0).mean('x', keep_attrs=True)
    im = so_section.plot(x='y', y='lev', yincrease=False, ax = ax)
    ax.set_title(ds.attrs['source_id'])
    ax.set_xlabel('Latitude')
    ax.set_ylabel('Depth (m)')
#    #advance the axis index
    ax_idx += 1 # this is a nifty python trick to increment a variable by one

so

obvi the whole fig is messy, and i can easily change font size, padding, etc to help that, but i don't know how to change the auto generated colorbars, specifically the colorbar labels. the labels currently say 'sea water salinity [0.001].' i tried using matplotlib.pyplot.colorbar but that just added an extra colorbar. how do i change the label/other properties of this existing colorbar?

also here's the link to the code i'm working from right now in case you need more context https://github.com/abby-baskind/seniorthesis/blob/174061f42cc6d46c114c203dc4f8edb6f0edabf3/notebooks/so_with_a_loop_wooooo.ipynb

jbusecke commented 3 years ago

I think if you pass cbar_kwargs={'label':'some_label'} to im = so_section.plot(x='y', y='lev', yincrease=False, ax = ax) like this:

im = so_section.plot(x='y', y='lev', yincrease=False, ax = ax, cbar_kwargs={'label':'some_label'})

You can also change other propoerties of the colorbar with this. See here for examples.

gmacgilchrist commented 3 years ago

If no luck with that, remember that you don't have to use xarray's plotting capabilities, but just the matplotlib native syntax. So that would be:

im = ax.pcolormesh(so_section['y'],so_section['lev'],so_section)
plt.colorbar(im,ax=ax)

I tend to do this simply because I'm used to building figures like this. But I know that Julius prefers to do things entirely through the xarray .plot command. Suggest you work with whatever makes more intuitive sense to you.

jbusecke commented 3 years ago

Yes, I should have mentioned that actually. Also it is actually good to be able to switch to different ways of plotting to diagnose errors (e.g. in #7)