Closed tommedema closed 2 years ago
When applying a rolling window of 30 days on this stock you can see the zero computations is very common:
Yes, the issue is that the estimator is technically an estimator for the squared spead, which may become negative in finite samples. As we cannot take the square root of a negative value, we reset to zero negative spread estimates. This is a common issue in the literature. The good news is that edge
produces fewer zero estimates compared to other methods (although in some cases the "zero computations is very common" as you correctly observed).
Please have a look at the paper:
I see, thank you. For now I am dynamically adjusting the date range until a non-zero value exists. If that's not possible, I forward fill from prior estimations.
Mmm that's a bit dangerous because, on average, it would create an upward bias in the estimates. Depending on the use case, I would recommend the following:
1) If you are averaging the spreads somehow (e.g., average spread in a portfolio, or regression analyses), I would keep the zero estimates. Although they make little economic sense, they are more correct statistically. Indeed, this option reduces the upward bias that you would have by imposing a positive spread estimate. So the final results of the use case should be more correct
2) If you are interested in point estimates (e.g., best guess of the spread of a stock in a month, conditional on a positive estimate), then I would take the absolute value of the (negative) spread estimate instead of resetting it to zero. I found this option to work quite well in some preliminary studies on the US stock market, although it is too early to release it officialy. To do that in python, do not use the bidask
package on PyPI. Instead, copy/paste this function in your code and replace the final line:
return float(max(0, s2) ** 0.5)
with:
return abs(s2) ** 0.5
Hope this helps!
That makes sense, I'll go with option 2. Much appreciated!
Take this example python code:
The spread here is estimated at zero, which seems unlikely. This is real stock data for 30 days of a random stock I pulled (split adjusted prices for ticker
A
with last entry being on 1999-12-31).