Closed donbowen closed 1 year ago
@kag223 @Qiyu6769 @JerseyK
@donbowen Thank you for the gifts!!
Based on your feedback we modified our proposal.md to align with the provided dataset.
Our changes:
- Sunset
@donbowen
After cleaning the dataset you provided and filtering using cik with the SP500 data we only ended up with about 150 unique company (sellers) Gvkeys. We then further looked at the data and realized that some companies, such as AAPL did not report to any specific companies, but rather reported selling to other countries and other markets instead.
Therefore we just decided against filtering just companies because it would provide a more holistic view of the seller company's finance. This also yielded about 350 unique company (sellers) Gvkeys. We have made this decision after submitting the proposal update.
Here is the CSV with the ~350 gvkeys.
Variables we would like for accounting data:
Thank you so much! - Sunset
Please find the name of the corresponding variables
The documentation is here: https://sites.bu.edu/qm222projectcourse/files/2014/08/compustat_users_guide-2003.pdf
Other documentation on the web might help you figure out the varnames (below)
For example, I use these as a "standard" set of controls:
*======================================================================
g td = (dlc+dltt)
g td_a = td/at
g short_debt = dlc/td // % 1 yr debt g long_debt_dum = (dltt > 0) if dltt != . // any long term debt?
g me = csho*prcc_f
g td_mv = td/(td+me)
g dltt_a = dltt / at
g l_a = log(at)
g l_sale = log(sale)
g prof_a = oibdp/at // aka ROA
*g prof_dum = (prof_a > 0) if prof_a != .
g mb = (at - ceq + (csho*prcc_f))/at
g ppe_a = ppent/at // aka tangibility
g cash_a = che/at
g xrd_a = xrd / at replace xrd_a = 0 if xrd_a == .
g capx_a = capx/at
g div_d = . replace div_d = 0 if dv == 0 replace div_d = 1 if dv >0 replace div_d = . if dv == .
g dv_a = dv/at
g invopps_FG09 = (prcc_f * cshpri + pstkl + dltt + dlc - txditc) / at
bysort gvkey (fyear): g sales_g = (sale/sale[_n-1])-1 if fyear == fyear[_n-1] + 1
*bysort gvkey (fyear): g emp_g = (emp/emp[_n-1])-1 if fyear == fyear[_n-1]
g temp = fyear if prcc_f != . bysort gvkey (fyear): egen first_fyear_with_price = min(temp) count if first_fyear_with_price > fyear & first_fyear_with_price != . g age = fyear - first_fyear_with_price replace age = . if age < 0 drop temp first_fyear_with_price
g atr = txt / (txt + ib) replace atr = 0 if txt < 0 replace atr = 1 if txt > ib replace atr = . if txt == . | ib == .
egen temp = rowtotal(ib dp txt xint), missing g smalltaxlosscarry = (tlcf > 0) & (tlcf < ib + dp + txt + xint) if tlcf != . & temp != . g largetaxlosscarry = (tlcf > 0) & (tlcf > ib + dp + txt + xint) if tlcf != . & temp != . // blank if tlcf blank OR if all of (ib dp txt xint) blank
g l_emp = log(1+emp) g l_ppent = log(1+ppent) g l_laborratio = log(ppent/emp)
The variable names I have are here:
Sorted by: r; t=0.05 12:07:34
.
I come late but bearing gifts. The thing I thought I was going to ask you to do, we will skip. I can elaborate in person. Your data discussion is in parts now moot because you will start with a (dirty) dataset already assembled. The proposal needs to change. The key to your project remains execution on the dashboard, but you'll need to convince me you cleaned this data well.
Gift 1: This attached data-cust_supply_2019_2022.zip
ctype == COMPANY
Gift 2: Filter this dataset using cik to the S&P500 firms. Then output a csv of the gvkeys of the firms. (Should be ~500 rows.) Send to me along with a request about which variables you want. I'll send back accounting data on these firms for 2018-2022.