Closed minertom closed 3 years ago
Hi @minertom ,
Thanks for your question.
This syntax is a special indexing syntax that works with Pandas DataFrames, NumPy arrays, TensorFlow tensors and a few other libraries.
Here's a simple example:
import numpy as np
a = np.array([10, 20, 30, 40, 50])
i = np.array([False, True, False, True, True]) # is True for every item we want, and otherwise False
print(a[i]) # prints [20 40 50]
Now suppose I only want to keep the even numbers in an array, here's one way to do it:
a = np.array([1, 3, 4, 8, 2, 5, 4])
i = (a % 2 == 0) # this will be equal to array([False, False, True, True, True, False, True])
print(a[i]) # prints [4 8 2 4]
Now let's look at the line that confused you:
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
First, note that oecd_bli["INEQUALITY"]=="TOT"
is a pandas Series equal to True
everywhere the "INEQUALITY"
feature is equal to "TOT"
. So oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
is a new pandas DataFrame containing only the rows where the "INEQUALITY"
feature is equal to "TOT"
.
Here's a simplified example:
import pandas as pd
oecd_bli = pd.DataFrame({
"INEQUALITY": ["a", "b", "TOT", "TOT", "c", "TOT"],
"Other": [10, 20, 30, 40, 50, 60]
})
print(oecd_bli["INEQUALITY"]=="TOT")
# prints this Pandas Series:
# 0 False
# 1 False
# 2 True
# 3 True
# 4 False
# 5 True
# Name: INEQUALITY, dtype: bool
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
print(oecd_bli)
# prints:
# INEQUALITY Other
# 2 TOT 30
# 3 TOT 40
# 5 TOT 60
I hope this is clear. For more info on advanced array indexing, check out NumPy's docs and Panda's docs. You can also check out the tutorial notebooks I made:
Hope this helps.
Cristal clear explanation, Thank you very much, sir "Aurélien Geron".
Thank you, sir. Thanks for your explanation. @AmeyaSaonerkar
Hi, I am not completely new to python but this construct is a little bit beyond what I have encountered before. It seems to me that the statement
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
is some sort of "list comprehenson". But I don't understand it. It is apparently a recursive function.The only guess that I can come up with is that this statement replaces text within the input data. For example, in the CSV file that is used, oecd_bli, "Inequality" is replaced with "Total" and "Indicator" is replace with "Life expectancy".
What is the term that is used for this kind of python function?
Thank You Tom