Open jdhoffa opened 7 months ago
Context from @cjyetman: It possibly does not need any action, but would be good to verify what is currently happening and if there is an improvement that can be made, e.g. figuring out what data is available in FactSet and if this is something we can/should watch out for. If there are more rows that could be removed, I believe that is something ideally done in the FactSet extraction code, rather than waiting to do it later in the data.prep process.
Migrated from https://github.com/RMI-PACTA/pacta.data.preparation/issues/270
cc @cjyetman @AlexAxthelm
https://github.com/RMI-PACTA/pacta.data.preparation/blob/831c9b960c8be8e27eeca53f6db489f000603268/R/prepare_financial_data.R#L23-L28
Currently,
prepare_financial_data()
does some filtering to remove rows that have insufficient data to be useful, however some rows still make it through that also may not be useful. For instance, there are currently some Equity rows that haveadj_price == 0
oradj_shares_outstanding == 0
but not both. Since the share ownership weight is calculated withnumber_of_shares / shares_outstanding_all_classes
, and to getnumber_of_shares
frommarket_value
we need the share price, bothadj_price
andadj_shares_outstanding
need to be non-NA, legitimate values for a row of data to be useful. Whether these rows withadj_price == 0
oradj_shares_outstanding == 0
are "legitimate" values is not fully known currently. We could/should either verify with FactSet if these values are legit, or we could consider assuming they are not legit and removing them (though that may be a rabbit hole we want to avoid).This is the current distribution of the problem...