dask / dask-tutorial

Dask tutorial
https://tutorial.dask.org
BSD 3-Clause "New" or "Revised" License
1.83k stars 702 forks source link

solution in 04_dataframe parts 5 and 6 should use idxmax #229

Closed graingert closed 2 years ago

graingert commented 2 years ago

" 5.) What day of the week has the worst average departure delay?" and "6.) What holiday has the worst average departure delay?" both ask for the actual day/holiday in question but the worked solutions return a DataFrame:

eg for exercise 6:

>>> df.merge(holidays, on=["Date"], how="left").groupby("holiday").DepDelay.mean().compute()
holiday
Christmas Day                   5.251485
Columbus Day                    3.551486
Independence Day                3.819829
Labor Day                       6.847114
Martin Luther King Jr. Day     12.026764
Memorial Day                    4.533657
New Year's Day                  9.480000
Thanksgiving                    4.386392
Veterans Day                    6.418787
Veterans Day (Observed)        11.795065
Washington's Birthday           6.696615
Independence Day (Observed)     6.222344
Christmas Day (Observed)        5.567119
New Year's Day (Observed)       5.730458
Name: DepDelay, dtype: float64

but should probably be:

>>> df.merge(holidays, on=["Date"], how="left").groupby("holiday").DepDelay.mean().idxmax().compute()
'Martin Luther King Jr. Day'
jsignell commented 2 years ago

Thanks @graingert for this suggestion! The code in question has changed in the most recent revisions of the tutorial, and we are now using idxmax.