ben-rounds / GA_portfolio

A portfolio of data science projects for the General Assembly Data Science course
0 stars 0 forks source link

Final Project Peer review feedback from Mary #4

Open marymitchellnj opened 8 years ago

marymitchellnj commented 8 years ago

Hi Ben, Thanks for letting me review your project work for "FRONT-RUNNING THE FED: USING NLP TO PREDICT FX VOLATILITY FROM FOMC STATEMENTS." Here are my comments:

STRENGTHS: Impressive topic and execution all around!

Presentation title and subtitle (on presentation slide 1) are excellent and capture the project’s focus in just a few words.

I especially appreciate how you have identified and gathered specific datasets that are uniquely suited to this project and that not everyone knows about, such as the financial dictionary with terms and sentiments aggregated from 10-K statements.

You’ve done an excellent, thorough job in documenting your code and showing exactly what is happening step by step. Good choices in changing non-zero sentiments to 1 for binary classification, in changing all words to lowercase for easier matching with data scraped from the Web, and in compressing the financial dictionary to a subset that contains sentiment-loaded words. The aggregation of a “meaningful words” set is a great idea. I like the helper function you created (and revised) to scrape and clean data from a URL. Brilliant. Good assessment of tradeoff of speed vs. benefits of the function.

Great work in cleaning and manipulating the data and in engineering new features. Your dataframe of the sentiment data from the various FOMC statements is really powerful. When I viewed your sentiment_ts.head() I was immediately drawn to the linkage of the sentiment and the dates of the statements. This would make an interesting interactive tool if later there was a way to link the set of statements from the specific dates so someone interested in this could drill down and read the underlying statements.

I’m impressed with how far you’ve already taken the logic and the possibilities you’ve uncovered. You’ve done an exceptional job in exploring a challenging, relevant topic.

SUGGESTIONS AND QUESTIONS: —Research question on your second slide is “Can natural language processing on Fed policy statements reveal insights?” Consider expanding the question slightly, such as "Can natural language processing on Federal Open Market Committee (FOMC) press releases provide an indicator of how volatile foreign exchange markets will be in the hours and days following each announcement?”

—Great visualization on presentation slide 3. One note: the title/term “Cumulative Levels” was not immediately clear to me.

—If your audience includes some people from outside finance and economics (such as in our class), consider adding a footnote definition of pegged vs. non-pegged (floating) currencies and to make sure everyone understands those terms from the outset. From your chart, I think you’re focusing on the Japanese Yen, the Euro and the British Pound as three major currencies that float—that are not pegged to the US dollar. Or maybe these are the leading examples. When I reviewed your code, I could see that you had examined nine currencies. Later, I saw the Yen, Euro and British Pound described as the three major reserve currencies, so it looks like you chose them for that reason. While this is something that is second nature to you because of your profession, it would be helpful to explain the context for those who have a limited understanding of international finance. http://www.investopedia.com/articles/forex/061015/top-exchange-rates-pegged-us-dollar.asp http://www.investopedia.com/terms/c/currencypair.asp

—When you say that currency exchange rates are clean except for a single data point, you’ve piqued my curiosity. What is that data point and will it impact anything? For example, is it an outlier that you chose to remove?

—Note that spacy (on slides 3 and 5) should be spaCy.io. You may also want to define it: a library for industrial-strength text processing in Python.

—For third bullet point (on slide 5)….change to “…following FOMC announcements”

—Regarding impact on currencies based on sentiment of FOMC announcements, I’m thinking about the timestamp aspect. Since typically by the time of an FOMC announcement, the European markets would be closed, I assume you’re focusing first on impact with the Japanese Yen and then on impact with the Euro and GBP. A question is how to account for that initial impact on the Yen and how it, in turn, likely impacts the Euro and GBP, vs. a case where the impact on all three currencies occurred exactly at the same time.

—Regarding your thought on using Twitter data to test sentiment leading up to FOMC announcements, which segment(s) of Twitter data you would be looking at? Twitter datasets from each country associated with the non-pegged currencies? How will you filter out the “noise” of Tweets that seem to be less related to this project (e.g. those about and from celebrities and others in the entertainment industry)?

Hope this feedback will be helpful. Please let me know if any questions on my comments.

Best regards, Mary

@masongallo @lemonsoup