Closed firmai closed 3 years ago
It looks like there is slight error in our computation of the Piotroski Score (PS) portfolios. We categorize this signal as continuous-decile when it should have been discrete (ranging from 0 to 9). We also should be going long stocks with a score of 8 or 9 and shorting stocks with a score of 0 or 1. See caption for Table 3 of Piotroski (2000):
Gonna be honest, not sure this is a high priority thing to fix. But we'll try to remember to fix this when we update the data next year.
Hi @chenandrewy thanks for the reply, I think it's a bit more systematic than that, the screenshots that I gave in the beginning also have null values, and there is a larger list of variables that also have this problem. I know how hard it is to publish something open source, it gets combed over with a fine comb, but there is something quite great in that as it will ensure that in the long run, you have the most reputable (verifiable) study on the topic - so kudos to that.
Ah, sorry I should have looked more closely. On DivSeason, can you help me see when the NAs stop appearing? If it's just one month then it's probably because the strategy shorts dividend payers that happen to not be paying, and perhaps there's not enough data back there. I guess DivSesason is a strategy we spent a lot of time on, so I'm not very concerned about missing values.
Overall, three are missing values for many reasons. The simplest one is that some variables are not continuous enough to be sorted into quintiles. Limited data early in the sample interacts with this issue. Since IBES begins around 1985, I think this is the most likely cause for ExclExp.
We'll add this to the FAQ.
It turns out the missing portfolio returns for ExclExp was not due to IBES data availability. Instead, it's because ExclExp has a big mode at 0. Intuitively, ExclExp is "Street" aka "Non-Gaap" aka IBES earnings less GAAP earnings. So a lot of the time there is no difference, no funny exclusions and you get a ton of zeros. Here is the distribution of Excluded Expenses in June 1996
And then when you try to sort into quintiles, you get annoying edge cases.
In our portfolio code, we use inequality constraints on the extreme quantiles to maintain good behavior in the long-short portfolios. In the interior portfolios, you get these edge cases. This will be a pain for ML folks. But for single sorting it's fine.
Nothing to see here, unless you're an accounting nerd, but we'll ad this to the FAQ.
We updated the FAQ to explain these missing values:
The Piotroski Score at the very bottom shows that for each row there is at least on portfolio value missing (null, NaN)
The PS score is the weirdest of them all:
Originally posted by @firmai in https://github.com/OpenSourceAP/CrossSection/issues/43#issuecomment-859581890