QuantEcon / QuantEcon.lectures.code

Code Repository to Support QuantEcon Lecture Site
BSD 3-Clause "New" or "Revised" License
52 stars 115 forks source link

Data and Empirics: Pandas Exercise Error #4

Open duncangh opened 7 years ago

duncangh commented 7 years ago

In the first exercise, where the goal is to calculate the percentage price change over the year 2013 for an array of tickers, it appears the graph showing the solution is incorrect. The code appears to work, albeit not the most eloquent solution. The "solution" graph appears to show AAPL price change of ~-50% when really Apple appreciated in 2013.

Here is a cleaner implementation of the solution that leverages more of pandas' great functionality.

ticker_list = {'INTC': 'Intel',
               'MSFT': 'Microsoft',
               'IBM': 'IBM',
               'BHP': 'BHP',
               'TM': 'Toyota',
               'AAPL': 'Apple',
               'AMZN': 'Amazon',
               'BA': 'Boeing',
               'QCOM': 'Qualcomm',
               'KO': 'Coca-Cola',
               'GOOG': 'Google',
               'SNE': 'Sony',
               'PTR': 'PetroChina'}

syms = list(ticker_list.keys()) # use to filter columns

path = 'https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/ticker_data.csv'

# set the index col to the date column and parse it as datetime type. 
# Slice out only relevant columns
df = pd.read_csv(path, index_col=0, parse_dates=[0]).loc[:, syms]

# resample by year (2013), take the first value and last value. 
# use pandas builtin pct_change() method across the axis of first/last share price value
answer = 100 * pd.concat([df.resample('A').first(), df.resample('A').last()]).pct_change().dropna()

# replace ticker symbols with company names for aesthetics
answer.rename(columns=ticker_list, inplace=True)

# Create bar plot of percent change
answer.T.plot.bar()
jstac commented 7 years ago

Thanks @duncangh, much appreciated! We'll review this and I'm sure it will benefit from your feedback.

natashawatkins commented 7 years ago

Hi @duncangh thanks for the feedback! You're correct - the plot was wrong.

I agree .pct_change() is a cool method, although in this example it's a little tricky to use as we want to calculate it for only the first and last day of 2013. Your method works well, however it becomes tricky to sort as the result is a DataFrame, and therefore we'd need to rename the column etc.

I have gone ahead and updated the code a little to get rid of the dictionary to Series conversion though.