TeachingDataScience / data-science-course

Data Science Course Materials
11 stars 16 forks source link

Use "apply" with lambdas to run a function on each element in a pandas column #16

Closed datadave closed 9 years ago

datadave commented 9 years ago

This is in response to a student question. Here's an example of using a lambda expression to run a function across select elements in a pandas column.

The background was that there was a column which included strings, NaNs, and lists

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
import collections
df = pd.read_csv("posts_and_data.csv")
df['words'] = df['raw_post'].str.split(' ')

A function: collections.Counter(), only runs on lists.

So use a "lambda" expression with apply, and an if statement, to run the function selectively on the cells with lists:

# create a new column containing the output
# However, only run it if the cell is a list
df['wordcounts']=df.words.apply(lambda x: collections.Counter(x) if type(x)==list else x)