df.describe() Even Odd
count 3.0 3.0
mean 4.0 3.0
std 2.0 2.0
min 2.0 1.0
25% 3.0 2.0
50% 4.0 3.0
75% 5.0 4.0
max 6.0 5.0
Upgrade
Open in app


Published in
Level Up Coding
You have 2 free member-only stories left this month.
Upgrade for unlimited access.

Zita
Follow
Nov 4
·
6 min read
·
·
Listen
Save
Pandas Basics Cheat Sheet (2023) — Python for Data Science
The absolute basics for beginners learning Pandas in 2022

Photo from Unsplash by Johannes Groll
The Pandas library is one of the most powerful libraries in Python. It is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language.
Check out the sections below to learn the various functions and tools Pandas offers.
Sections:
1. Pandas Data Structures
2. Dropping
3. Sort & Rank
4. Retrieving Series/DataFrame Information
5. DataFrame Summary
6. Selection
7. Applying Functions
8. Data Alignment
9. In/Out
Pandas Data Structures
There are two main types of data structures that the Pandas library is centered around. The first is a one-dimensional array called a Series, and the second is a two-dimensional table called a Data Frame.
Series — One dimensional labeled array
s = pd.Series([3, -5, 7, 4], index = ['a','b','c','d'])
a 3
b -5
c 7
d 4
Data Frame — A two dimensional labeled data structure
data = {'Country':['Belgium','India','Brazil'], 'Capital':['Brussels','New Delhi','Brasilia'], 'Population':['111907','1303021','208476']}>>> df = pd.DataFrame(data, columns = ['Country','Capital','Population']) Country Capital Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
Dropping
In this section, you’ll learn how to remove specific values from a Series, and how to remove columns or rows from a Data Frame.
s and df in the code below are used as examples of a Series and Data Frame throughout this section.
sa 6
b -5
c 7
d 4>>> df Country Capital Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
Drop values from rows (axis = 0)
s.drop(['a','c']) b -5
d 4
Drop values from columns (axis = 1)
df.drop('Country', axis = 1) Capital Population
0 Brussels 111907
1 New Delhi 1303021
2 Brasilia 208476
Sort & Rank
In this section, you’ll learn how to sort Data Frames by an index, or column, along with learning how to rank column values.
df in the code below is used as an example Data Frame throughout this section.
df Country Capital Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
Sort by labels along an axis
df.sort_index() Country Capital Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
Sort by values along an axis
df.sort_values(by = 'Country') Country Capital Population
0 Belgium Brussels 111907
2 Brazil Brasilia 208476
1 India New Delhi 1303021
Assign ranks to entries
df.rank() Country Capital Population
0 1.0 2.0 1.0
1 3.0 3.0 2.0
2 2.0 1.0 3.0
Retrieving Series/DataFrame Information
In this section, you’ll learn how to retrieve info from a Data Frame that includes the dimensions, column names column types, and index range.
df in the code below is used as an example Data Frame throughout this section.
df Country Capital Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
Country 3 non-null object
Capital 3 non-null object
Population 3 non-null object
dtypes: object(3)
memory usage: 152.0+ bytes
Number of non-NA values
df.count()Country 3
Capital 3
Population 3
DataFrame Summary
In this section, you’ll learn how to retrieve summary statistics of a Data Frame which include the sum of each column, min/max values of each column, mean values of each column, and others.
df in the code below is used as an example of a Data Frame throughout this section.
df Even Odd
0 2 1
1 4 3
2 6 5
Sum of values
df.sum()
Even 12
Odd 9
Cumulative sum of values
df.cumsum() Even Odd
0 2 1
1 6 4
2 12 9
Minimum value
df.min()
Even 2
Odd 1
Maximum value
df.max()
Even 6
Odd 5
Summary statistics
df.describe() Even Odd
count 3.0 3.0
mean 4.0 3.0
std 2.0 2.0
min 2.0 1.0
25% 3.0 2.0
50% 4.0 3.0
75% 5.0 4.0
max 6.0 5.0
Mean of values
df.mean()
Even 4.0
Odd 3.0
Median of values
df.median()
Even 4.0
Odd 3.0
Selection
In this section, you’ll learn how to retrieve specific values from a Series and Data Frame.
s and df in the code below are used as examples of a Series and Data Frame throughout this section.
sa 6
b -5
c 7
d 4>>> df Country Capital Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
Get one element
s['b']
-5
Get subset of a DataFrame
df[1:] Country Capital Population
1 India New Delhi 1303021
2 Brazil Brasilia 208476
Select single value by row & column
df.iloc[0,0]
'Belgium'
Select single value by row and column labels
df.loc[0,'Country']
'Belgium'
Select single row of subset rows
df.ix[2]Country Brazil
Capital Brasilia
Population 208476
Select a single column of subset of columns
df.ix[:,'Capital']0 Brussels
1 New Delhi
2 Brasilia
Select rows and columns
df.ix[1,'Capital']
'New Delhi'
Use filter to adjust DataFrame
df[df['Population'] > 120000] Country Capital Population
1 India New Delhi 1303021
2 Brazil Brasilia 208476
Set index a of Series s to 6
s['a'] = 6 a 6
b -5
c 7
d 4
Applying Functions
In this section, you’ll learn how to apply a function to all values of a Data Frame or a specific column.
df in the code below is used as an example of a Data Frame throughout this section.
In this section, you’ll learn how to add, subtract, and divide two series that have different indexes from one another.
s and s3 in the code below are used as examples of Series throughout this section.
sa 6
b -5
c 7
d 4>>> s3a 7
c -2
d 3
Internal Data Alignment
s + s3a 13.0
b NaN
c 5.0
d 7.0#NA values are introduced in the indices that don't overlap
Arithmetic Operations with Fill Methods
s.add(s3, fill_value = 0)a 13.0
b -5.0
c 5.0
d 7.0>>> s.sub(s3, fill_value = 2)a -1.0
b -7.0
c 9.0
d 1.0>>> s.div(s3, fill_value = 4)a 0.857143
b -1.250000
c -3.500000
d 1.333333
In/Out
In this section, you’ll learn how to read a CSV file, Excel file, and SQL Query into Python using Pandas. You will also learn how to export a Data Frame from Pandas into a CSV file, Excel file, and SQL Query.
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
pd.read_sql('SELECT * FROM my_table;', engine)
pd.read_sql_table('my_table', engine)
Write to SQL Query
pd.to_sql('myDF', engine)
Conclusion
Python is the top dog when it comes to data science for now and in the foreseeable future. Knowledge of Pandas, one of its most powerful libraries is often a requirement for Data Scientists today.
Use this cheat sheet as a guide in the beginning and come back to it when needed, and you’ll be well on your way to mastering the Pandas library.
Join my email list with 5k+ people to get “The Complete Python for Data Science Cheat Sheet Booklet” for FREE
11
11
Sign up for Top Stories
By Level Up Coding
A monthly summary of the best stories shared in Level Up Coding Take a look.
Emails will be sent to sheyngalts@gmail.com. Not you?
Get this newsletter
More from Level Up Coding
Follow
Coding tutorials and news. The developer homepage gitconnected.com && skilled.dev && levelup.dev
Sum of values
Cumulative sum of values
Minimum value
Maximum value
Summary statistics
Upgrade
Open in app


Published in
Level Up Coding
You have 2 free member-only stories left this month.
Upgrade for unlimited access.

Zita
Follow
Nov 4
·
6 min read
·
·
Listen
Save
Pandas Basics Cheat Sheet (2023) — Python for Data Science
The absolute basics for beginners learning Pandas in 2022

Photo from Unsplash by Johannes Groll
The Pandas library is one of the most powerful libraries in Python. It is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language.
Check out the sections below to learn the various functions and tools Pandas offers.
Sections: 1. Pandas Data Structures 2. Dropping 3. Sort & Rank 4. Retrieving Series/DataFrame Information 5. DataFrame Summary 6. Selection 7. Applying Functions 8. Data Alignment 9. In/Out
Pandas Data Structures
There are two main types of data structures that the Pandas library is centered around. The first is a one-dimensional array called a Series, and the second is a two-dimensional table called a Data Frame.
Series — One dimensional labeled array
Data Frame — A two dimensional labeled data structure
Dropping
In this section, you’ll learn how to remove specific values from a Series, and how to remove columns or rows from a Data Frame.
s and df in the code below are used as examples of a Series and Data Frame throughout this section.
Drop values from rows (axis = 0)
Drop values from columns (axis = 1)
Sort & Rank
In this section, you’ll learn how to sort Data Frames by an index, or column, along with learning how to rank column values.
df in the code below is used as an example Data Frame throughout this section.
Sort by labels along an axis
Sort by values along an axis
Assign ranks to entries
Retrieving Series/DataFrame Information
In this section, you’ll learn how to retrieve info from a Data Frame that includes the dimensions, column names column types, and index range.
df in the code below is used as an example Data Frame throughout this section.
(rows, columns)
Describe index
Describe DataFrame columns
Info on DataFrame
Number of non-NA values
DataFrame Summary
In this section, you’ll learn how to retrieve summary statistics of a Data Frame which include the sum of each column, min/max values of each column, mean values of each column, and others.
df in the code below is used as an example of a Data Frame throughout this section.
Sum of values
Cumulative sum of values
Minimum value
Maximum value
Summary statistics
Mean of values
Median of values
Selection
In this section, you’ll learn how to retrieve specific values from a Series and Data Frame.
s and df in the code below are used as examples of a Series and Data Frame throughout this section.
Get one element
Get subset of a DataFrame
Select single value by row & column
Select single value by row and column labels
Select single row of subset rows
Select a single column of subset of columns
Select rows and columns
Use filter to adjust DataFrame
Set index a of Series s to 6
Applying Functions
In this section, you’ll learn how to apply a function to all values of a Data Frame or a specific column.
df in the code below is used as an example of a Data Frame throughout this section.
Apply function
Data Alignment
In this section, you’ll learn how to add, subtract, and divide two series that have different indexes from one another.
s and s3 in the code below are used as examples of Series throughout this section.
Internal Data Alignment
Arithmetic Operations with Fill Methods
In/Out
In this section, you’ll learn how to read a CSV file, Excel file, and SQL Query into Python using Pandas. You will also learn how to export a Data Frame from Pandas into a CSV file, Excel file, and SQL Query.
Read CSV file
Write to CSV file
Read Excel file
Write to Excel file
Read multiple sheets from the same file
Read SQL Query
Write to SQL Query
Conclusion
Python is the top dog when it comes to data science for now and in the foreseeable future. Knowledge of Pandas, one of its most powerful libraries is often a requirement for Data Scientists today.
Use this cheat sheet as a guide in the beginning and come back to it when needed, and you’ll be well on your way to mastering the Pandas library.
Join my email list with 5k+ people to get “The Complete Python for Data Science Cheat Sheet Booklet” for FREE
11
11
Sign up for Top Stories
By Level Up Coding
A monthly summary of the best stories shared in Level Up Coding Take a look.
Emails will be sent to sheyngalts@gmail.com. Not you?
Get this newsletter
More from Level Up Coding
Follow
Coding tutorials and news. The developer homepage gitconnected.com && skilled.dev && levelup.dev
Read more from Level Up Coding