Open cjyetman opened 1 month ago
Makes sense to me! Unless you two can think of a good reason (any reason at all) that we would want to keep those rows for any other purpose in the analysis.
Also maybe of interest to @Antoine-Lalechere / maybe @Antoine-Lalechere has an idea if we would ever want financial data that is not assigned an issue type?
Interesting! I can't think of any use case we may need it - can we have an ISIN filtered out easily so that I check what is kicked out?
When the "raw" FactSet data is "prepared" in
pacta.data.preparation::prepare_financial_data()
, the first filter removes any rows that haveNA
for theissue_type
.https://github.com/RMI-PACTA/pacta.data.preparation/blob/fa6e801ef85263a3a612461a791a6d5d9bd2af89/R/prepare_financial_data.R#L18
This is always done because it's necessary to know the issue type in order to use it properly in the PACTA process. Removing the rows where
issue_type
isNA
typically reduces the in size memory of thefinancial_data
object from multiple GBs to 100's of MBs, dropping 10's of millions of rows. It would be advantageous to preemptively remove those rows from the "raw" FactSet data before it gets to data.prep.ping @jdhoffa @AlexAxthelm