QuantEcon / lecture-datascience.myst

Source repository for QuantEcon Datascience
https://datascience.quantecon.org
25 stars 19 forks source link

Tiny Issues in Pandas Section #232

Closed JayCata closed 1 year ago

JayCata commented 1 year ago

There are two tiny issues in the Pandas lecture notes on Merging (found here). The first one is a potential source of confusion that might not be worth correcting while the latter involves an incorrect statement that probably should be corrected.

Issue 1

The units in the DataFrames sq_miles and pop are in millions of square miles and millions of people respectively. The WDI data, however, is in trillions of 2010 dollars. At some point, consumption per capita and consumption per square miles are calculated without accounting for this difference in units.

While the numbers are not necessarily interpreted anywhere, it may be worth clarifying that the units are in millions of dollars as the consumption per capita numbers look funny otherwise. An alternative solution would be to multiply the numbers by a million.

At worst, the way it is right now might cause some confusion. It also does not affect Exercise 1, so it might not be worth correcting.

Issue 2

When explaining the 'how' argument of pd.merge()and its four potential options, the notes incorrectly claim that "left" is the default argument for how. The default argument is actually "inner." I believe the example is such that all keys in the left DataFrame are present in the right one, so there is no difference between the "left" and "inner" in the provided examples

This might be one of the reasons that the mistake was undetected, but it should probably be corrected. Also, it would be quite easy to remove a country from the sq_miles DataFrame since it is defined manually in the notebook. This way the behavior of "inner" and "left" will be different.

doctor-phil commented 1 year ago

Thanks @JayCata, I'll take care of these early next week