current12 / Stat-222-Project

3 stars 0 forks source link

What companies leave the dataset and why? #29

Closed ijyliu closed 4 months ago

ijyliu commented 6 months ago

See section of all data EDA notebook.

We should be careful of survivorship issues surrounding bankruptcy.

Most problems with dropout stem from earnings call date issues, bankruptcy issues seem rare.

investigate adding an indicator or interaction for being a dropout firm

ijyliu commented 5 months ago

Full report of reasons for firm dropout

35 firms - 6%, do not have data for the last fixed quarter date 2016-10-1

The vast majority can easily be fixed by re-scraping with the restriction that the earnings_call_date be before 2017-1-1, rather than year and quarter be before 2017 Q1. This occurs because the firms actually do have fiscal year and quarters a good deal in advance of earnings call dates, so we cut the calls off with our current code that doesn't scrape items for Q1 2017 and beyond. We already have comprehensive financial data, so only calls for these companies would need to be scraped and added to the data.

image

Example: ADSK would benefit from scraping with the different date restriction. We can get fixed quarter date 2016-07-01 and 2016-10-1 with earnings calls in still in 2016 and calendar year and quarter 2017 Q1 and 2017 Q2

After looking at the dropout firms, I've also found a company to drop because they only do annual calls and that would be a bad comparison with the rest of the data.

And there are several companies where we can look to alternative data sources to try to fill in gaps or re-scrape all the data for that company. If that doesn't work, in a few cases we should drop the company.

The below file contains explanations for dropout and suggested actions:

https://github.com/current12/Stat-222-Project/blob/main/Code/Exploratory%20Data%20Analysis/All%20Data/dropout_firms_explanations.xlsx

Bankruptcy

There is only one bankruptcy, BTU, which occurred in April 13, 2016. I searched validation bankruptcy datasets - no record of other firms in credit rating and other data going bankrupt.

Here's BTU's rating history:

image

I think we should drop all data for BTU as we seem to be missing a lot of data leading up to the bankruptcy. There's a lot of evidence online of other agencies (Fitch, though couldn't verify S&P) lowering their rating to like C, and I would expect S&P to have continued to lower their ratings.

Code

Dropout + Bankruptcy code: https://github.com/current12/Stat-222-Project/blob/main/Code/Exploratory%20Data%20Analysis/All%20Data/All%20Data%20-%20Analyze%20Dropout.ipynb

seanzhou1207 commented 5 months ago

What you have sounds good to me

ijyliu commented 5 months ago

@current12 what do you think?

either you or i could do the additional scrape for these companies following the instructions in https://github.com/current12/Stat-222-Project/blob/main/Code/Exploratory%20Data%20Analysis/All%20Data/dropout_firms_explanations.xlsx

ijyliu commented 5 months ago

also, anyone have ideas for alternative sources? we could also literally just google and copy-paste the transcripts in since it's not that many

ijyliu commented 5 months ago

drop decision: drop BTU after last actual rating in data, drop DEO, etc. affirmed decisions in excel file

rescrape attempt: try all calls for 35 companies with year + quarter OR earnings call date between 2010-2016 (can have either condition to get call)

alternative source attempt: googling

ijyliu commented 5 months ago

@OwenLin2001 do you have an estimated time of completion for this? need to know if we should expect it before or after we have to run a bunch of stuff for the writeup

unless you are almost done, i suggest pausing work and instead working on constructing another classifier so we have something more than logistic regression.

please let us know if you are pausing or not

ijyliu commented 5 months ago

decided to pause until after writeup 1

ijyliu commented 4 months ago

@OwenLin2001 you can definitely resume work on this now

ijyliu commented 4 months ago

note: skip scraping for the items that drop out because of filing date mismatch, but for things with 'drop company' in them, we still need to think about what to do

OwenLin2001 commented 4 months ago

Everything is in dropout_firms_explanations.xlsx now. Out of the 35 dropout companies (after solving for filing date). Only 12 of them are actual dropout firms. 2 scraped more - missing data 2 no action - unable to find alternative source 1 no action - bankrupt 3 no action - missing due to data construction 4 removed - Only contribute 1 or 3 calls that are apart from each other

ijyliu commented 4 months ago

For BTU, drop all data after 3/3/2015

ijyliu commented 4 months ago

all data fixed quarter dates updated