Open AmritSd opened 1 year ago
Filtering Exchanges: The following are the exchange codes in CRSP: 1: NYSE 2: AMEX 3: NASDAQ -1: Suspended by NYSE, AMEX, or NASDAQ -2: Halted by NYSE, AMEX 5: Mutual fund 6: ARCA
If we want stocks from just NYSE, we would have to keep codes 1, -1, and -3. But we would be including stocks suspended by AMEX and NASDAQ. So we will have to check for each stock based on whether they ever had an exchange code equal to 1 and keep all rows from those stocks.
Filtering prices: CRSP sets missing prices to zero. So we have to fill forward with the last known price since our buying signal is a price threshold.
If CRSP uses an ask/bid average to calc the closing price then it sets the price to a negative value. So we have to take the abs value of prices to get the actual price.
Missing returns: In some cases, the RET column has codes 'B' and 'C'. In an example I found with code 'C', the return should've been negative but was stated as missing because the stock was halted in the previous time period. Screenshot of an example:
We also filter to only keep stocks that have been below a certain threshold (say $5) at any point.
We remove rows where shares outstanding (SHROUT) is missing and change the units of SHROUT from thousands
There are some caveats and gotchas in the pricing data provided by WRDS. So we have to clean it.
Check file: data_cleaning/price_data_cleaning.ipynb